String 常量池、字符串拼接
标签(空格分隔): 随手记
本文的jdk环境是java8,涉及到其他版本的时候会特殊说明
疑问
程序片段一:
String str="a"+"b"+"c";
程序片段二:
String str1="a";
String str=str1+"b"+"c";
程序片段三:
String str1="b";
String str="a"+str1+"c";
上述程序片段运行后,String常量池的情况是一样的吗?接下来一探究竟
本文验证实验用的常量池是Class文件常量池,不是String常量池,但是当程序运行的时候,Class常量池的字符串的内容会被加入到String常量池,所以可以用Class文件常量池来验证,下文中的常量池都是指Class文件常量池
实验
我们把上述的程序片段加入到TestClass文件的main函数中,首先通过javac TestClass.java
编译,然后通过javap -v TestClass
获取编译完成的字节码,查看class常量池的情况
程序片段一:
Constant pool:
#1 = Methodref #4.#13 // java/lang/Object."<init>":()V
#2 = String #14 // abc
#3 = Class #15 // com/dev/tools/kit/TestClass
#4 = Class #16 // java/lang/Object
#5 = Utf8 <init>
#6 = Utf8 ()V
#7 = Utf8 Code
#8 = Utf8 LineNumberTable
#9 = Utf8 main
#10 = Utf8 ([Ljava/lang/String;)V
#11 = Utf8 SourceFile
#12 = Utf8 TestClass.java
#13 = NameAndType #5:#6 // "<init>":()V
#14 = Utf8 abc
#15 = Utf8 com/dev/tools/kit/TestClass
#16 = Utf8 java/lang/Object
我们可以看到上面的常量池里面有一个abc
的字符串,没有单独的a、b、c,也就是说这种情况是编译器做了优化,只把最后结果加入了常量池。看看编译后的程序片段验证一下
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=2, args_size=1
0: ldc #2 // String abc
2: astore_1
3: return
LineNumberTable:
line 5: 0
line 6: 3
关注0那一行,应该是直接把程序片段编译成了
String str="abc"
程序片段二:
Constant pool:
#1 = Methodref #10.#19 // java/lang/Object."<init>":()V
#2 = String #20 // a
#3 = Class #21 // java/lang/StringBuilder
#4 = Methodref #3.#19 // java/lang/StringBuilder."<init>":()V
#5 = Methodref #3.#22 // java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
#6 = String #23 // b
#7 = String #24 // c
#8 = Methodref #3.#25 // java/lang/StringBuilder.toString:()Ljava/lang/String;
#9 = Class #26 // com/dev/tools/kit/TestClass
#10 = Class #27 // java/lang/Object
#11 = Utf8 <init>
#12 = Utf8 ()V
#13 = Utf8 Code
#14 = Utf8 LineNumberTable
#15 = Utf8 main
#16 = Utf8 ([Ljava/lang/String;)V
#17 = Utf8 SourceFile
#18 = Utf8 TestClass.java
#19 = NameAndType #11:#12 // "<init>":()V
#20 = Utf8 a
#21 = Utf8 java/lang/StringBuilder
#22 = NameAndType #28:#29 // append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
#23 = Utf8 b
#24 = Utf8 c
#25 = NameAndType #30:#31 // toString:()Ljava/lang/String;
#26 = Utf8 com/dev/tools/kit/TestClass
#27 = Utf8 java/lang/Object
#28 = Utf8 append
#29 = Utf8 (Ljava/lang/String;)Ljava/lang/StringBuilder;
#30 = Utf8 toString
#31 = Utf8 ()Ljava/lang/String;
会发现常量池比上一个程序片段大很多,先来关注字符串,会发现有a
、b
、c
,没有abc
,而且多了一些StringBuilder
的东西,比如append
方法,比如toString
方法。看看这种情况下,编译完的代码是什么情况
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=3, args_size=1
0: ldc #2 // String a
2: astore_1
3: new #3 // class java/lang/StringBuilder
6: dup
7: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V
10: aload_1
11: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
14: ldc #6 // String b
16: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: ldc #7 // String c
21: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
24: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
27: astore_2
28: return
LineNumberTable:
line 5: 0
line 6: 3
line 7: 28
还是从0行关注起,读后面的注释,大致应该能明白,就是new了一个StringBuilder将a、b、c append进去,然后toString得到一个新的字符串,这种情况下,a、b、c都是存在在常量池的,但是abc并没有进入常量池。
总结
接下来看看程序片段三,应该是和程序片段二一样的表现,可以自己去验证下。
综上两个程序片段,我们可以得到一些结论,就是当
+
号拼接的是字符串常量值的时候,会被优化成一个字符串加入到常量池里面,当+
拼接的时候存在字符串变量的时候,会变成StringBuilder的append方法拼接,最后toString得到一个新的字符串,但是拼接起来的字符串没有加入到常量池,只有拼接过程中的每一个字符串加入到了常量池,当然我的程序片段中a
应该是声明的时候就加入进去了。
验证
程序片段四:
String str=new StringBuilder().append("a").append("b").append("c").toString();
程序片段五:
String str1="a";
String str=new StringBuilder().append(str1).append("b").append("c").toString();
跟结论是一致的,有兴趣的自己可以验证下
延申
然后我们来探究一下StringBuilder的append和toString方法,看看为什么没有将最终的字符串加入到常量池
扒一扒源码:
public StringBuilder append(String str) {
super.append(str);
return this;
}
public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}
难道说是因为new String()
的时候不会把字符串加入到常量池里面吗?
我们验证一下:
程序片段六:
String str=new String("abc");
Constant pool:
#1 = Methodref #6.#15 // java/lang/Object."<init>":()V
#2 = Class #16 // java/lang/String
#3 = String #17 // abc
#4 = Methodref #2.#18 // java/lang/String."<init>":(Ljava/lang/String;)V
#5 = Class #19 // com/dev/tools/kit/TestClass
#6 = Class #20 // java/lang/Object
#7 = Utf8 <init>
#8 = Utf8 ()V
#9 = Utf8 Code
#10 = Utf8 LineNumberTable
#11 = Utf8 main
#12 = Utf8 ([Ljava/lang/String;)V
#13 = Utf8 SourceFile
#14 = Utf8 TestClass.java
#15 = NameAndType #7:#8 // "<init>":()V
#16 = Utf8 java/lang/String
#17 = Utf8 abc
#18 = NameAndType #7:#21 // "<init>":(Ljava/lang/String;)V
#19 = Utf8 com/dev/tools/kit/TestClass
#20 = Utf8 java/lang/Object
#21 = Utf8 (Ljava/lang/String;)V
可以看到#17就是abc,也就是说字符串加入到了常量池,那到底怎么回事呢?难道是构造器不对?
程序片段七:
char[] value=new char[]{'a','b','c'};
String str=new String(value,0,3);
Constant pool:
#1 = Methodref #5.#14 // java/lang/Object."<init>":()V
#2 = Class #15 // java/lang/String
#3 = Methodref #2.#16 // java/lang/String."<init>":([CII)V
#4 = Class #17 // com/dev/tools/kit/TestClass
#5 = Class #18 // java/lang/Object
#6 = Utf8 <init>
#7 = Utf8 ()V
#8 = Utf8 Code
#9 = Utf8 LineNumberTable
#10 = Utf8 main
#11 = Utf8 ([Ljava/lang/String;)V
#12 = Utf8 SourceFile
#13 = Utf8 TestClass.java
#14 = NameAndType #6:#7 // "<init>":()V
#15 = Utf8 java/lang/String
#16 = NameAndType #6:#19 // "<init>":([CII)V
#17 = Utf8 com/dev/tools/kit/TestClass
#18 = Utf8 java/lang/Object
#19 = Utf8 ([CII)V
可以看到确实没有了,那两个构造器的区别是什么呢?为什么一个将字符串加入到了常量池,一个没有呢?
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
对比两个构造器,我们发现,最大的区别就是一个的入参是String
一个入参是char数组
,所以有理由怀疑,<font color='red'>当编译器发现一个声明的字符串就会把它加入到常量池中</font>
稍微验证一下:
程序片段八:
public static void main(String[] args) {
test("abc");
}
public static void test(String string){
}
Constant pool:
#1 = Methodref #5.#16 // java/lang/Object."<init>":()V
#2 = String #17 // abc
#3 = Methodref #4.#18 // com/dev/tools/kit/TestClass.test:(Ljava/lang/String;)V
#4 = Class #19 // com/dev/tools/kit/TestClass
#5 = Class #20 // java/lang/Object
#6 = Utf8 <init>
#7 = Utf8 ()V
#8 = Utf8 Code
#9 = Utf8 LineNumberTable
#10 = Utf8 main
#11 = Utf8 ([Ljava/lang/String;)V
#12 = Utf8 test
#13 = Utf8 (Ljava/lang/String;)V
#14 = Utf8 SourceFile
#15 = Utf8 TestClass.java
#16 = NameAndType #6:#7 // "<init>":()V
#17 = Utf8 abc
#18 = NameAndType #12:#13 // test:(Ljava/lang/String;)V
#19 = Utf8 com/dev/tools/kit/TestClass
#20 = Utf8 java/lang/Object
可以找到17条就是abc
所以目前得到的结论是<font color='red'>当编译器发现一个声明的字符串就会把它加入到常量池中</font>,跟String的构造器没有关系