关于java:深入挖掘-FST-快速序列化压缩内存的利器的特性和原理

FST 的概念和定义

FST 序列化全称是 Fast Serialization Tool，它是对 Java 序列化的替换实现。既然前文中提到 Java 序列化的两点严重不足，在 FST 中失去了较大的改善，FST 的特色如下：

JDK 提供的序列化晋升了 10 倍，体积也缩小 3-4 倍多
反对堆外 Maps，和堆外 Maps 的长久化
反对序列化为 JSON

FST 序列化的应用

FST 的应用有两种形式，一种是快捷方式，另一种须要应用 ObjectOutput 和 ObjectInput。

间接应用 FSTConfiguration 提供的序列化和反序列化接口

public static void serialSample() {  FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration();  User object = new User();  object.setName("huaijin");  object.setAge(30);  System.out.println("serialization, " + object);  byte[] bytes = conf.asByteArray(object);  User newObject = (User) conf.asObject(bytes);                              System.out.println("deSerialization, " + newObject);}

FSTConfiguration 也提供了注册对象的 Class 接口，如果不注册，默认会将对象的 Class Name 写入。这个提供了易用高效的 API 形式，不应用 ByteArrayOutputStreams 而间接失去 byte[]。

应用 ObjectOutput 和 ObjectInput，能更细腻管制序列化的写入写出：

static FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration();static void writeObject(OutputStream outputStream, User user) throws IOException {    FSTObjectOutput out = conf.getObjectOutput(outputStream);        out.writeObject(user);    out.close();}static FstObject readObject(InputStream inputStream) throws Exception {    FSTObjectInput input = conf.getObjectInput(inputStream); User fstObject = (User) input.readObject(User.class);                          input.close(); return fstObject;}

FST 在 Dubbo 中的利用

Dubbo 中对 FstObjectInput 和 FstObjectOutput 从新包装解决了序列化和反序列化空指针的问题。
并且结构了 FstFactory 工厂类，应用工厂模式生成 FstObjectInput 和 FstObjectOutput。其中同时应用单例模式，管制整个利用中 FstConfiguration 是单例，并且在初始化时将须要序列化的对象全副注册到 FstConfiguration。
对外提供了同一的序列化接口 FstSerialization，提供 serialize 和 deserialize 能力。

FST 序列化/反序列化

FST 序列化存储格局

基本上所有以 Byte 模式存储的序列化对象都是相似的存储构造，不论 class 文件、so 文件、dex 文件都是相似，这方面没有什么翻新的格局，最多是在字段内容上做了一些压缩优化，包含咱们最常应用的 utf-8 编码都是这个做法。

FST 的序列化存储和个别的字节格式化存储计划也没有别树一帜的中央，比方上面这个 FTS 的序列化字节文件

00000001:  0001 0f63 6f6d 2e66 7374 2e46 5354 426500000010:  616e f701 fc05 7630 7374 7200

格局：

Header|类名长度|类名String|字段1类型(1Byte) | [长度] | 内容|字段2类型(1Byte) | [长度] | 内容|…

0000：字节数组类型：00 标识 OBJECT
0001：类名编码，00 标识 UTF 编码，01 示意 ASCII 编码
0002：Length of class name (1Byte) = 15
0003~0011：Class name string (15Byte)
0012：Integer 类型标识 0xf7
0013：Integer 的值=1
0014：String 类型标识 0xfc
0015：String 的长度=5
0016~001a：String 的值"v0str"
001b~001c：END

从下面能够看到 Integer 类型序列化后只占用了一个字节（值等于 1），并不像在内存中占用 4Byte，所以能够看出是依据肯定规定做了压缩，具体代码看FSTObjectInput#instantiateSpecialTag中对不同类型的读取，FSTObjectInput 也定义不同类型对应的枚举值：

public class FSTObjectOutput implements ObjectOutput {    private static final FSTLogger LOGGER = FSTLogger.getLogger(FSTObjectOutput.class);    public static Object NULL_PLACEHOLDER = new Object() {        public String toString() { return "NULL_PLACEHOLDER"; }};    public static final byte SPECIAL_COMPATIBILITY_OBJECT_TAG = -19; // see issue 52        public static final byte ONE_OF = -18;        public static final byte BIG_BOOLEAN_FALSE = -17;        public static final byte BIG_BOOLEAN_TRUE = -16;        public static final byte BIG_LONG = -10;       public static final byte BIG_INT = -9;        public static final byte DIRECT_ARRAY_OBJECT = -8;        public static final byte HANDLE = -7;        public static final byte ENUM = -6;        public static final byte ARRAY = -5;        public static final byte STRING = -4;        public static final byte TYPED = -3; // var class == object written class        public static final byte DIRECT_OBJECT = -2;        public static final byte NULL = -1;        public static final byte OBJECT = 0;        protected FSTEncoder codec;        ...}

FST 序列化和反序列化原理

对 Object 进行 Byte 序列化，相当于做了长久化的存储，在反序列的时候，如果 Bean 的定义产生了扭转，那么反序列化器就要做兼容的解决方案，咱们晓得对于 JDK 的序列化和反序列，serialVersionUID 对版本控制起了很重要的作用。FST 对这个问题的解决方案是通过 @Version 注解进行排序。

在进行反序列操作的时候，FST 会先反射或者对象 Class 的所有成员，并对这些成员进行了排序，这个排序对兼容起了关键作用，也就是 @Version 的原理。在 FSTClazzInfo 中定义了一个 defFieldComparator 比拟器，用于对 Bean 的所有 Field 进行排序：

public final class FSTClazzInfo {    public static final Comparator<FSTFieldInfo> defFieldComparator = new Comparator<FSTFieldInfo>() {    @Override    public int compare(FSTFieldInfo o1, FSTFieldInfo o2) {        int res = 0;            if ( o1.getVersion() != o2.getVersion() ) {         return o1.getVersion() < o2.getVersion() ? -1 : 1;    }            // order: version, boolean, primitives, conditionals, object references             if (o1.getType() == boolean.class && o2.getType() != boolean.class) {                            return -1;            }             if (o1.getType() != boolean.class && o2.getType() == boolean.class) {              return 1;             }            if (o1.isConditional() && !o2.isConditional()) {             res = 1;             } else if (!o1.isConditional() && o2.isConditional()) {                res = -1;            } else if (o1.isPrimitive() && !o2.isPrimitive()) {                                 res = -1;            } else if (!o1.isPrimitive() && o2.isPrimitive())                                   res = 1;//              if (res == 0) // 64 bit / 32 bit issues//                  res = (int) (o1.getMemOffset() - o2.getMemOffset());                    if (res == 0)            res = o1.getType().getSimpleName().compareTo(o2.getType().getSimpleName());            if (res == 0)            res = o1.getName().compareTo(o2.getName());        if (res == 0) {    return o1.getField().getDeclaringClass().getName().compareTo(o2.getField().getDeclaringClass().getName());    }         return res;    }     };      ...}

从代码实现上能够看到，比拟的优先级是 Field 的 Version 大小，而后是 Field 类型，所以总的来说 Version 越大排序越靠后，至于为什么要排序，看下 FSTObjectInput#instantiateAndReadNoSer 办法

public class FSTObjectInput implements ObjectInput {  protected Object instantiateAndReadNoSer(Class c, FSTClazzInfo clzSerInfo, FSTClazzInfo.FSTFieldInfo referencee, int readPos) throws Exception {                   Object newObj;                                                                 newObj = clzSerInfo.newInstance(getCodec().isMapBased());        ...        } else {            FSTClazzInfo.FSTFieldInfo[] fieldInfo = clzSerInfo.getFieldInfo();             readObjectFields(referencee, clzSerInfo, fieldInfo, newObj,0,0);           }        return newObj;    }    protected void readObjectFields(FSTClazzInfo.FSTFieldInfo referencee, FSTClazzInfo serializationInfo, FSTClazzInfo.FSTFieldInfo[] fieldInfo, Object newObj, int startIndex, int version) throws Exception {        if ( getCodec().isMapBased() ) {            readFieldsMapBased(referencee, serializationInfo, newObj);                     if ( version >= 0 && newObj instanceof Unknown == false)                           getCodec().readObjectEnd();            return;        }        if ( version < 0 )            version = 0;        int booleanMask = 0;        int boolcount = 8;        final int length = fieldInfo.length;        int conditional = 0;        for (int i = startIndex; i < length; i++) {  // 留神这里的循环                        try {                FSTClazzInfo.FSTFieldInfo subInfo = fieldInfo[i];                            if (subInfo.getVersion() > version ) {   // 须要进入下一个版本的迭代              int nextVersion = getCodec().readVersionTag();  // 对象流的下一个版本              if ( nextVersion == 0 ) // old object read             {                 oldVersionRead(newObj);                  return;              }               if ( nextVersion != subInfo.getVersion() ) {  // 同一个Field的版本不容许变，并且版本变更和流的版本放弃同步                   throw new RuntimeException("read version tag "+nextVersion+" fieldInfo has "+subInfo.getVersion());               }          readObjectFields(referencee,serializationInfo,fieldInfo,newObj,i,nextVersion);  // 开始下一个Version的递归                     return;                }                 if (subInfo.isPrimitive()) {                    ...                  } else {                    if ( subInfo.isConditional() ) {                         ...                     } // object 把读出来的值保留到FSTFieldInfo中                    Object subObject = readObjectWithHeader(subInfo);                    subInfo.setObjectValue(newObj, subObject);                }                     ...

从这段代码的逻辑根本就能够晓得 FST 的序列化和反序列化兼容的原理了，留神外面的循环，正是依照排序后的 Filed 进行循环，而每个 FSTFieldInfo 都记录本人在对象流中的地位、类型等详细信息：

序列化：

依照 Version 对 Bean 的所有 Field 进行排序（不包含 static 和 transient 润饰的 member），没有 @Version 注解的 Field 默认 version=0；如果 version 雷同，依照 version, boolean, primitives, conditionals, object references 排序
依照排序的 Field 把 Bean 的 Field 一一写到输入流
@Version 的版本只能加不能减小，如果相等的话，有可能因为默认的排序规定，导致流中的 Filed 程序和内存中的 FSTFieldInfo[]数组的程序不统一，而注入谬误

反序列化：

反序列化依照对象流的格局进行解析，对象流中保留的 Field 程序和内存中的 FSTFieldInfo 程序保持一致
雷同版本的 Field 在对象流中存在，在内存 Bean 中缺失：可能抛异样（会有后向兼容问题）
对象流中蕴含内存 Bean 中没有的高版本 Field：失常（老版本兼容新）
雷同版本的 Field 在对象流中缺失，在内存 Bean 中存在：抛出异样
雷同的 Field 在对象流和内存 Bean 中的版本不统一：抛出异样
内存 Bean 减少了不高于最大版本的 Field：抛出异样

所以从下面的代码逻辑就能够剖析出这个应用规定：@Version 的应用准则就是，每新增一个 Field，就对应的加上 @Version 注解，并且把 version 的值设置为以后版本的最大值加一，不容许删除 Field

另外再看一下 @Version 注解的正文：明确阐明了用于后向兼容

package org.nustaq.serialization.annotations;import java.lang.annotation.ElementType;import java.lang.annotation.Retention;import java.lang.annotation.RetentionPolicy;import java.lang.annotation.Target;@Retention(RetentionPolicy.RUNTIME)@Target({ElementType.FIELD})/*** support for adding fields without breaking compatibility to old streams. * For each release of your app increment the version value. No Version annotation means version=0.* Note that each added field needs to be annotated.** e.g.** class MyClass implements Serializable {**     // fields on initial release 1.0 *     int x;*     String y;**     // fields added with release 1.5*     @Version(1) String added;*     @Version(1) String alsoAdded;**     // fields added with release 2.0*     @Version(2) String addedv2;*     @Version(2) String alsoAddedv2;** }** If an old class is read, new fields will be set to default values. You can register a VersionConflictListener* at FSTObjectInput in order to fill in defaults for new fields.** Notes/Limits:* - Removing fields will break backward compatibility. You can only Add new fields.* - Can slow down serialization over time (if many versions)* - does not work for Externalizable or Classes which make use of JDK-special features such as readObject/writeObject*   (AKA does not work if fst has to fall back to 'compatible mode' for an object).* - in case you use custom serializers, your custom serializer has to handle versioning**/public @interface Version {    byte value();}

public class FSTBean implements Serializable {    /** serialVersionUID */    private static final long serialVersionUID = -2708653783151699375L;             private Integer v0in    private String v0str;}

筹备序列化和反序列化办法

public class FSTSerial {        private static void serialize(FstSerializer fst, String fileName) {                 try {        FSTBean fstBean = new FSTBean();        fstBean.setV0int(1);        fstBean.setV0str("v0str");        byte[] v1 = fst.serialize(fstBean);                FileOutputStream fos = new FileOutputStream(new File("byte.bin"));             fos.write(v1, 0, v1.length);        fos.close();                } catch (Exception e) {        e.printStackTrace();        }     }        private static void deserilize(FstSerializer fst, String fileName) {           try {        FileInputStream fis = new FileInputStream(new File("byte.bin"));               ByteArrayOutputStream baos = new ByteArrayOutputStream();                       byte[] buf = new byte[256];        int length = 0;        while ((length = fis.read(buf)) > 0) {            baos.write(buf, 0, length);        }        fis.close();        buf = baos.toByteArray();        FSTBean deserial = fst.deserialize(buf, FSTBean.class);                         System.out.println(deserial);        System.out.println(deserial);            } catch (Exception e) {        e.printStackTrace();    }  }    public static void main(String[] args) {        FstSerializer fst = new FstSerializer();        serialize(fst, "byte.bin");        deserilize(fst, "byte.bin");    }}