共计 15010 个字符,预计需要花费 38 分钟才能阅读完成。
序
本文主要研究一下 Java 9 的 Compact Strings
Compressed Strings(Java 6)
Java 6 引入了 Compressed Strings,对于 one byte per character 使用 byte[],对于 two bytes per character 继续使用 char[];之前可以使用 -XX:+UseCompressedStrings 来开启,不过在 java7 被废弃了,然后在 java8 被移除
Compact Strings(Java 9)
Java 9 引入了 Compact Strings 来取代 Java 6 的 Compressed Strings,它的实现更过彻底,完全使用 byte[] 来替代 char[],同时新引入了一个字段 coder 来标识是 LATIN1 还是 UTF16
String
java.base/java/lang/String.java
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence,
Constable, ConstantDesc {
/**
* The value is used for character storage.
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*
* Additionally, it is marked with {@link Stable} to trust the contents
* of the array. No other facility in JDK provides this functionality (yet).
* {@link Stable} is safe here, because value is never null.
*/
@Stable
private final byte[] value;
/**
* The identifier of the encoding used to encode the bytes in
* {@code value}. The supported values in this implementation are
*
* LATIN1
* UTF16
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*/
private final byte coder;
/** Cache the hash code for the string */
private int hash; // Default to 0
/** use serialVersionUID from JDK 1.0.2 for interoperability */
private static final long serialVersionUID = -6849794470754667710L;
/**
* If String compaction is disabled, the bytes in {@code value} are
* always encoded in UTF16.
*
* For methods with several possible implementation paths, when String
* compaction is disabled, only one code path is taken.
*
* The instance field value is generally opaque to optimizing JIT
* compilers. Therefore, in performance-sensitive place, an explicit
* check of the static boolean {@code COMPACT_STRINGS} is done first
* before checking the {@code coder} field since the static boolean
* {@code COMPACT_STRINGS} would be constant folded away by an
* optimizing JIT compiler. The idioms for these cases are as follows.
*
* For code such as:
*
* if (coder == LATIN1) {…}
*
* can be written more optimally as
*
* if (coder() == LATIN1) {…}
*
* or:
*
* if (COMPACT_STRINGS && coder == LATIN1) {…}
*
* An optimizing JIT compiler can fold the above conditional as:
*
* COMPACT_STRINGS == true => if (coder == LATIN1) {…}
* COMPACT_STRINGS == false => if (false) {…}
*
* @implNote
* The actual value for this field is injected by JVM. The static
* initialization block is used to set the value here to communicate
* that this static final field is not statically foldable, and to
* avoid any possible circular dependency during vm initialization.
*/
static final boolean COMPACT_STRINGS;
static {
COMPACT_STRINGS = true;
}
/**
* Class String is special cased within the Serialization Stream Protocol.
*
* A String instance is written into an ObjectOutputStream according to
* <a href=”{@docRoot}/../specs/serialization/protocol.html#stream-elements”>
* Object Serialization Specification, Section 6.2, “Stream Elements”</a>
*/
private static final ObjectStreamField[] serialPersistentFields =
new ObjectStreamField[0];
/**
* Initializes a newly created {@code String} object so that it represents
* an empty character sequence. Note that use of this constructor is
* unnecessary since Strings are immutable.
*/
public String() {
this.value = “”.value;
this.coder = “”.coder;
}
//……
public char charAt(int index) {
if (isLatin1()) {
return StringLatin1.charAt(value, index);
} else {
return StringUTF16.charAt(value, index);
}
}
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String aString = (String)anObject;
if (coder() == aString.coder()) {
return isLatin1() ? StringLatin1.equals(value, aString.value)
: StringUTF16.equals(value, aString.value);
}
}
return false;
}
public int compareTo(String anotherString) {
byte v1[] = value;
byte v2[] = anotherString.value;
if (coder() == anotherString.coder()) {
return isLatin1() ? StringLatin1.compareTo(v1, v2)
: StringUTF16.compareTo(v1, v2);
}
return isLatin1() ? StringLatin1.compareToUTF16(v1, v2)
: StringUTF16.compareToLatin1(v1, v2);
}
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
hash = h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
}
return h;
}
public int indexOf(int ch, int fromIndex) {
return isLatin1() ? StringLatin1.indexOf(value, ch, fromIndex)
: StringUTF16.indexOf(value, ch, fromIndex);
}
public String substring(int beginIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
int subLen = length() – beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
if (beginIndex == 0) {
return this;
}
return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}
//……
byte coder() {
return COMPACT_STRINGS ? coder : UTF16;
}
byte[] value() {
return value;
}
private boolean isLatin1() {
return COMPACT_STRINGS && coder == LATIN1;
}
@Native static final byte LATIN1 = 0;
@Native static final byte UTF16 = 1;
//……
}
COMPACT_STRINGS 默认为 true,即该特性默认是开启的
coder 方法判断 COMPACT_STRINGS 为 true 的话,则返回 coder 值,否则返回 UTF16;isLatin1 方法判断 COMPACT_STRINGS 为 true 且 coder 为 LATIN1 则返回 true
诸如 charAt、equals、hashCode、indexOf、substring 等等一系列方法都依赖 isLatin1 方法来区分对待是 StringLatin1 还是 StringUTF16
StringConcatFactory
实例
public class Java9StringDemo {
public static void main(String[] args){
String stringLiteral = “tom”;
String stringObject = stringLiteral + “cat”;
}
}
这段代码 stringObject 由变量 stringLiteral 及 cat 拼接而来
javap
javac src/main/java/com/example/javac/Java9StringDemo.java
javap -v src/main/java/com/example/javac/Java9StringDemo.class
Last modified 2019 年 4 月 7 日; size 770 bytes
MD5 checksum fecfca9c829402c358c4d5cb948004ff
Compiled from “Java9StringDemo.java”
public class com.example.javac.Java9StringDemo
minor version: 0
major version: 56
flags: (0x0021) ACC_PUBLIC, ACC_SUPER
this_class: #4 // com/example/javac/Java9StringDemo
super_class: #5 // java/lang/Object
interfaces: 0, fields: 0, methods: 2, attributes: 3
Constant pool:
#1 = Methodref #5.#14 // java/lang/Object.”<init>”:()V
#2 = String #15 // tom
#3 = InvokeDynamic #0:#19 // #0:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
#4 = Class #20 // com/example/javac/Java9StringDemo
#5 = Class #21 // java/lang/Object
#6 = Utf8 <init>
#7 = Utf8 ()V
#8 = Utf8 Code
#9 = Utf8 LineNumberTable
#10 = Utf8 main
#11 = Utf8 ([Ljava/lang/String;)V
#12 = Utf8 SourceFile
#13 = Utf8 Java9StringDemo.java
#14 = NameAndType #6:#7 // “<init>”:()V
#15 = Utf8 tom
#16 = Utf8 BootstrapMethods
#17 = MethodHandle 6:#22 // REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
#18 = String #23 // \u0001cat
#19 = NameAndType #24:#25 // makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
#20 = Utf8 com/example/javac/Java9StringDemo
#21 = Utf8 java/lang/Object
#22 = Methodref #26.#27 // java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
#23 = Utf8 \u0001cat
#24 = Utf8 makeConcatWithConstants
#25 = Utf8 (Ljava/lang/String;)Ljava/lang/String;
#26 = Class #28 // java/lang/invoke/StringConcatFactory
#27 = NameAndType #24:#32 // makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
#28 = Utf8 java/lang/invoke/StringConcatFactory
#29 = Class #34 // java/lang/invoke/MethodHandles$Lookup
#30 = Utf8 Lookup
#31 = Utf8 InnerClasses
#32 = Utf8 (Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
#33 = Class #35 // java/lang/invoke/MethodHandles
#34 = Utf8 java/lang/invoke/MethodHandles$Lookup
#35 = Utf8 java/lang/invoke/MethodHandles
{
public com.example.javac.Java9StringDemo();
descriptor: ()V
flags: (0x0001) ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object.”<init>”:()V
4: return
LineNumberTable:
line 8: 0
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=3, args_size=1
0: ldc #2 // String tom
2: astore_1
3: aload_1
4: invokedynamic #3, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
9: astore_2
10: return
LineNumberTable:
line 11: 0
line 12: 3
line 13: 10
}
SourceFile: “Java9StringDemo.java”
InnerClasses:
public static final #30= #29 of #33; // Lookup=class java/lang/invoke/MethodHandles$Lookup of class java/lang/invoke/MethodHandles
BootstrapMethods:
0: #17 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Method arguments:
#18 \u0001cat
javap 之后可以看到通过 Java 9 利用 InvokeDynamic 调用了 StringConcatFactory.makeConcatWithConstants 方法进行字符串拼接优化;而 Java 8 则是通过转换为 StringBuilder 来进行优化
StringConcatFactory.makeConcatWithConstants
java.base/java/lang/invoke/StringConcatFactory.java
public final class StringConcatFactory {
//……
/**
* Concatenation strategy to use. See {@link Strategy} for possible options.
* This option is controllable with -Djava.lang.invoke.stringConcat JDK option.
*/
private static Strategy STRATEGY;
/**
* Default strategy to use for concatenation.
*/
private static final Strategy DEFAULT_STRATEGY = Strategy.MH_INLINE_SIZED_EXACT;
private enum Strategy {
/**
* Bytecode generator, calling into {@link java.lang.StringBuilder}.
*/
BC_SB,
/**
* Bytecode generator, calling into {@link java.lang.StringBuilder};
* but trying to estimate the required storage.
*/
BC_SB_SIZED,
/**
* Bytecode generator, calling into {@link java.lang.StringBuilder};
* but computing the required storage exactly.
*/
BC_SB_SIZED_EXACT,
/**
* MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
* This strategy also tries to estimate the required storage.
*/
MH_SB_SIZED,
/**
* MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
* This strategy also estimate the required storage exactly.
*/
MH_SB_SIZED_EXACT,
/**
* MethodHandle-based generator, that constructs its own byte[] array from
* the arguments. It computes the required storage exactly.
*/
MH_INLINE_SIZED_EXACT
}
static {
// In case we need to double-back onto the StringConcatFactory during this
// static initialization, make sure we have the reasonable defaults to complete
// the static initialization properly. After that, actual users would use
// the proper values we have read from the properties.
STRATEGY = DEFAULT_STRATEGY;
// CACHE_ENABLE = false; // implied
// CACHE = null; // implied
// DEBUG = false; // implied
// DUMPER = null; // implied
Properties props = GetPropertyAction.privilegedGetProperties();
final String strategy =
props.getProperty(“java.lang.invoke.stringConcat”);
CACHE_ENABLE = Boolean.parseBoolean(
props.getProperty(“java.lang.invoke.stringConcat.cache”));
DEBUG = Boolean.parseBoolean(
props.getProperty(“java.lang.invoke.stringConcat.debug”));
final String dumpPath =
props.getProperty(“java.lang.invoke.stringConcat.dumpClasses”);
STRATEGY = (strategy == null) ? DEFAULT_STRATEGY : Strategy.valueOf(strategy);
CACHE = CACHE_ENABLE ? new ConcurrentHashMap<>() : null;
DUMPER = (dumpPath == null) ? null : ProxyClassesDumper.getInstance(dumpPath);
}
public static CallSite makeConcatWithConstants(MethodHandles.Lookup lookup,
String name,
MethodType concatType,
String recipe,
Object… constants) throws StringConcatException {
if (DEBUG) {
System.out.println(“StringConcatFactory ” + STRATEGY + ” is here for ” + concatType + “, {” + recipe + “}, ” + Arrays.toString(constants));
}
return doStringConcat(lookup, name, concatType, false, recipe, constants);
}
private static CallSite doStringConcat(MethodHandles.Lookup lookup,
String name,
MethodType concatType,
boolean generateRecipe,
String recipe,
Object… constants) throws StringConcatException {
Objects.requireNonNull(lookup, “Lookup is null”);
Objects.requireNonNull(name, “Name is null”);
Objects.requireNonNull(concatType, “Concat type is null”);
Objects.requireNonNull(constants, “Constants are null”);
for (Object o : constants) {
Objects.requireNonNull(o, “Cannot accept null constants”);
}
if ((lookup.lookupModes() & MethodHandles.Lookup.PRIVATE) == 0) {
throw new StringConcatException(“Invalid caller: ” +
lookup.lookupClass().getName());
}
int cCount = 0;
int oCount = 0;
if (generateRecipe) {
// Mock the recipe to reuse the concat generator code
char[] value = new char[concatType.parameterCount()];
Arrays.fill(value, TAG_ARG);
recipe = new String(value);
oCount = concatType.parameterCount();
} else {
Objects.requireNonNull(recipe, “Recipe is null”);
for (int i = 0; i < recipe.length(); i++) {
char c = recipe.charAt(i);
if (c == TAG_CONST) cCount++;
if (c == TAG_ARG) oCount++;
}
}
if (oCount != concatType.parameterCount()) {
throw new StringConcatException(
“Mismatched number of concat arguments: recipe wants ” +
oCount +
” arguments, but signature provides ” +
concatType.parameterCount());
}
if (cCount != constants.length) {
throw new StringConcatException(
“Mismatched number of concat constants: recipe wants ” +
cCount +
” constants, but only ” +
constants.length +
” are passed”);
}
if (!concatType.returnType().isAssignableFrom(String.class)) {
throw new StringConcatException(
“The return type should be compatible with String, but it is ” +
concatType.returnType());
}
if (concatType.parameterSlotCount() > MAX_INDY_CONCAT_ARG_SLOTS) {
throw new StringConcatException(“Too many concat argument slots: ” +
concatType.parameterSlotCount() +
“, can only accept ” +
MAX_INDY_CONCAT_ARG_SLOTS);
}
String className = getClassName(lookup.lookupClass());
MethodType mt = adaptType(concatType);
Recipe rec = new Recipe(recipe, constants);
MethodHandle mh;
if (CACHE_ENABLE) {
Key key = new Key(className, mt, rec);
mh = CACHE.get(key);
if (mh == null) {
mh = generate(lookup, className, mt, rec);
CACHE.put(key, mh);
}
} else {
mh = generate(lookup, className, mt, rec);
}
return new ConstantCallSite(mh.asType(concatType));
}
private static MethodHandle generate(Lookup lookup, String className, MethodType mt, Recipe recipe) throws StringConcatException {
try {
switch (STRATEGY) {
case BC_SB:
return BytecodeStringBuilderStrategy.generate(lookup, className, mt, recipe, Mode.DEFAULT);
case BC_SB_SIZED:
return BytecodeStringBuilderStrategy.generate(lookup, className, mt, recipe, Mode.SIZED);
case BC_SB_SIZED_EXACT:
return BytecodeStringBuilderStrategy.generate(lookup, className, mt, recipe, Mode.SIZED_EXACT);
case MH_SB_SIZED:
return MethodHandleStringBuilderStrategy.generate(mt, recipe, Mode.SIZED);
case MH_SB_SIZED_EXACT:
return MethodHandleStringBuilderStrategy.generate(mt, recipe, Mode.SIZED_EXACT);
case MH_INLINE_SIZED_EXACT:
return MethodHandleInlineCopyStrategy.generate(mt, recipe);
default:
throw new StringConcatException(“Concatenation strategy ” + STRATEGY + ” is not implemented”);
}
} catch (Error | StringConcatException e) {
// Pass through any error or existing StringConcatException
throw e;
} catch (Throwable t) {
throw new StringConcatException(“Generator failed”, t);
}
}
//……
}
makeConcatWithConstants 方法内部调用了 doStringConcat,而 doStringConcat 方法则调用了 generate 方法来生成 MethodHandle;generate 根据不同的 STRATEGY 来生成 MethodHandle,这些 STRATEGY 有 BC_SB、BC_SB_SIZED、BC_SB_SIZED_EXACT、MH_SB_SIZED、MH_SB_SIZED_EXACT、MH_INLINE_SIZED_EXACT,默认是 MH_INLINE_SIZED_EXACT(可以通过 -Djava.lang.invoke.stringConcat 来改变默认的策略)
小结
Java 9 引入了 Compact Strings 来取代 Java 6 的 Compressed Strings,它的实现更过彻底,完全使用 byte[] 来替代 char[],同时新引入了一个字段 coder 来标识是 LATIN1 还是 UTF16
isLatin1 方法判断 COMPACT_STRINGS 为 true 且 coder 为 LATIN1 则返回 true;诸如 charAt、equals、hashCode、indexOf、substring 等等一系列方法都依赖 isLatin1 方法来区分对待是 StringLatin1 还是 StringUTF16
Java 9 利用 InvokeDynamic 调用了 StringConcatFactory.makeConcatWithConstants 方法进行字符串拼接优化,相比于 Java 8 通过转换为 StringBuilder 来进行优化,Java 9 提供了多种 STRATEGY 可供选择,这些 STRATEGY 有 BC_SB(等价于 Java 8 的优化方式)、BC_SB_SIZED、BC_SB_SIZED_EXACT、MH_SB_SIZED、MH_SB_SIZED_EXACT、MH_INLINE_SIZED_EXACT,默认是 MH_INLINE_SIZED_EXACT(可以通过 -Djava.lang.invoke.stringConcat 来改变默认的策略)
doc
String Compaction
JEP 254: Compact Strings
Java 9: Compact Strings
Compact Strings In Java 9
Java 9 Compact Strings Example
Evolution of Strings in Java to Compact Strings and Indify String Concatenation