作者:lds(lds2012@gmail.com)
日期:2017-04-11
前言
AndFix是阿里巴巴开源的Android热修复框架。其基本原理是利用JNI来实现方法的替换,以实现Android APP的热修复,即无需发版即可临时修复在线BUG。
热修复技术有很多种,AndFix采取的native方法替换方案,优点是即时生效,无性能损耗,缺点是只能修改方法,且兼容性可能有问题。
虽然其原理比较简单,但要深入理解,还需要对JNI,以及dalvik和Art两种虚拟机,甚至art的多种版本源码有比较深入的了解才行。整体难度还是比较大,因此本文并不深入到虚拟机实现细节,只针对JNI的相关部分进行了解。
源码地址:https://github.com/alibaba/AndFix
源码版本:0.5.0
一. 注册native方法
AndFix.java的native方法
package com.alipay.euler.andfix;
// ...
public class AndFix {
private static native boolean setup(boolean isArt, int apilevel);
private static native void replaceMethod(Method dest, Method src);
private static native void setFieldFlag(Field field);
}
这几个native方法是通过动态注册的,而不是通过静态注册的。这两种注册方法,据网传是动态注册效率更高,不需要每次都去jni通过函数名来查找。
static JNINativeMethod gMethods[] = {
/* name, signature, funcPtr */
{ "setup", "(ZI)Z",
(void*) setup },
{ "replaceMethod", "(Ljava/lang/reflect/Method;Ljava/lang/reflect/Method;)V",
(void*) replaceMethod },
{ "setFieldFlag", "(Ljava/lang/reflect/Field;)V",
(void*) setFieldFlag },
};
这里的三个native方法都根据当前运行时是dalvik还是art来路由到不同的实现函数,甚至art还根据其版本不同路由到针对不同版本art的实现。
当前运行时 | 实现源码文件 |
---|---|
dalvik | /jni/dalvik/dalvik_method_replace.cpp |
android 4.4 (api 19) | /jni/dalvik/art_method_replace_4_4.cpp |
android 5.0 (> api 19) | /jni/dalvik/art_method_replace_5_0.cpp |
android 5.1 (> api 21) | /jni/dalvik/art_method_replace_5_1.cpp |
android 6.0 (> api 22) | /jni/dalvik/art_method_replace_6_0.cpp |
android 7.0 (> api 23) | /jni/dalvik/art_method_replace_7_0.cpp |
这里也可以看出来两点,
- 第一:ART首发于Android 4.4。
- 第二,基本上以后每一版Android的ART都进行了修改,而AndFix这种解决方案兼容性差的问题在这里则体现得比较明显,一旦Android版本变化,则就必须针对其虚拟机来重写实现方法。
虽然针对不同虚拟机及版本有不同的实现,但通过代码来看,其原理比较一致,不同的实现仅为了调用不同虚拟机的不同API而已。所以下面只研究传统的dalvik实现方式。
二. 初始化(setup)
这里面有一个知识点,是如何检查当前运行时是dalvik还是Art,官方文档中的原文描述为:
您可以通过调用
System.getProperty("java.vm.version")
来验证正在使用哪种运行时。 如果使用的是 ART,则该属性值将是"2.0.0"
或更高。
代码实现为:
final String vmVersion = System.getProperty("java.vm.version");
boolean isArt = vmVersion != null && vmVersion.startsWith("2");
这代码其实有点问题,文档里说明的是art的version为等于或大于2.0.0,但代码只判断了是否为2开头,如果有天art版本号迭代到3了则会出现兼容性问题,不太严谨。
jboolean setup(JNIEnv* env, jclass clazz, jboolean isart, jint apilevel);
setup函数主要是为了一些初始化工作,在dalvik的实现里,主要是为了获取 libdvm.so
里面的几个函数指针,便于后面去调用。
一个是 dvmDecodeIndirectRef
函数。一个是 dvmThreadSelf
函数。
2.1 dvmDecodeIndirectRef()
先来看dalvik虚拟机里面的 dvmDecodeIndirectRef
函数的定义:
/*
* Convert an indirect reference to an Object reference. The indirect
* reference may be local, global, or weak-global.
*
* If "jobj" is NULL, or is a weak global reference whose reference has
* been cleared, this returns NULL. If jobj is an invalid indirect
* reference, kInvalidIndirectRefObject is returned.
*
* Note "env" may be NULL when decoding global references.
*/
Object* dvmDecodeIndirectRef(Thread* self, jobject jobj) {}
这个函数把一个jobject转换成了dalvik里面定义的 Object
对象,在dalvik里面 Object
对象,可用于实现:
- Class object
- Array Object
- data object
- String object
可用此函数获取到 ClassObject
。例如 NewObject
函数的源码:
static jobject NewObject(JNIEnv* env, jclass jclazz, jmethodID methodID, ...) {
ScopedJniThreadState ts(env);
ClassObject* clazz = (ClassObject*) dvmDecodeIndirectRef(ts.self(), jclazz);
if (!canAllocClass(clazz) || (!dvmIsClassInitialized(clazz) && !dvmInitClass(clazz))) {
assert(dvmCheckException(ts.self()));
return NULL;
}
Object* newObj = dvmAllocObject(clazz, ALLOC_DONT_TRACK);
jobject result = addLocalReference(ts.self(), newObj);
if (newObj != NULL) {
JValue unused;
va_list args;
va_start(args, methodID);
dvmCallMethodV(ts.self(), (Method*) methodID, newObj, true, &unused, args);
va_end(args);
}
return result;
}
2.2 dvmThreadSelf()
/*
* Like pthread_self(), but on a Thread*.
*/
Thread* dvmThreadSelf()
{
return (Thread*) pthread_getspecific(gDvm.pthreadKeySelf);
}
该方法用于获取当前线程。
三. 设置成员域权限(setFieldFlag)
该函数的用处是将需要修复的类的所有成员域都设置为 public
。
实现方式比较简单:
void dalvik_setFieldFlag(JNIEnv* env, jobject field) {
Field* dalvikField = (Field*) env->FromReflectedField(field);
dalvikField->accessFlags = dalvikField->accessFlags & (~ACC_PRIVATE)
| ACC_PUBLIC;
LOGD("dalvik_setFieldFlag: %d ", dalvikField->accessFlags);
}
四. 替换方法(replaceMethod)
第一步,将用于替换的class设置为已经初始化好了的状态:
jobject clazz = env->CallObjectMethod(dest, jClassMethod);
ClassObject* clz = (ClassObject*) dvmDecodeIndirectRef_fnPtr(
dvmThreadSelf_fnPtr(), clazz);
clz->status = CLASS_INITIALIZED;
这里好像并没有像xposed框架一样调用 dvmInitClass
函数来真正初始化class,而只是设置了status。
TODO: 为什么不初始化class,为什么又必须要设置status值?
然后将方式直接替换掉:
Method* meth = (Method*) env->FromReflectedMethod(src);
Method* target = (Method*) env->FromReflectedMethod(dest);
LOGD("dalvikMethod: %s", meth->name);
// meth->clazz = target->clazz;
meth->accessFlags |= ACC_PUBLIC;
meth->methodIndex = target->methodIndex;
meth->jniArgInfo = target->jniArgInfo;
meth->registersSize = target->registersSize;
meth->outsSize = target->outsSize;
meth->insSize = target->insSize;
meth->prototype = target->prototype;
meth->insns = target->insns;
meth->nativeFunc = target->nativeFunc;
除了 clazz, name, shroty, fastJni, noRef, shouldTrace, registerMap, inProfile 几个值以外的所有值都被替换成新的方法。
至于每个字段的含义,可以参考一下 dalvik 的源码中 Method
的结构体定义:
struct Method {
/* the class we are a part of */
ClassObject* clazz;
/* access flags; low 16 bits are defined by spec (could be u2?) */
u4 accessFlags;
/*
* For concrete virtual methods, this is the offset of the method
* in "vtable".
*
* For abstract methods in an interface class, this is the offset
* of the method in "iftable[n]->methodIndexArray".
*/
u2 methodIndex;
/*
* Method bounds; not needed for an abstract method.
*
* For a native method, we compute the size of the argument list, and
* set "insSize" and "registerSize" equal to it.
*/
u2 registersSize; /* ins + locals */
u2 outsSize;
u2 insSize;
/* method name, e.g. "<init>" or "eatLunch" */
const char* name;
/*
* Method prototype descriptor string (return and argument types).
*
* TODO: This currently must specify the DexFile as well as the proto_ids
* index, because generated Proxy classes don't have a DexFile. We can
* remove the DexFile* and reduce the size of this struct if we generate
* a DEX for proxies.
*/
DexProto prototype;
/* short-form method descriptor string */
const char* shorty;
/*
* The remaining items are not used for abstract or native methods.
* (JNI is currently hijacking "insns" as a function pointer, set
* after the first call. For internal-native this stays null.)
*/
/* the actual code */
const u2* insns; /* instructions, in memory-mapped .dex */
/* JNI: cached argument and return-type hints */
int jniArgInfo;
/*
* JNI: native method ptr; could be actual function or a JNI bridge. We
* don't currently discriminate between DalvikBridgeFunc and
* DalvikNativeFunc; the former takes an argument superset (i.e. two
* extra args) which will be ignored. If necessary we can use
* insns==NULL to detect JNI bridge vs. internal native.
*/
DalvikBridgeFunc nativeFunc;
/*
* JNI: true if this static non-synchronized native method (that has no
* reference arguments) needs a JNIEnv* and jclass/jobject. Libcore
* uses this.
*/
bool fastJni;
/*
* JNI: true if this method has no reference arguments. This lets the JNI
* bridge avoid scanning the shorty for direct pointers that need to be
* converted to local references.
*
* TODO: replace this with a list of indexes of the reference arguments.
*/
bool noRef;
/*
* JNI: true if we should log entry and exit. This is the only way
* developers can log the local references that are passed into their code.
* Used for debugging JNI problems in third-party code.
*/
bool shouldTrace;
/*
* Register map data, if available. This will point into the DEX file
* if the data was computed during pre-verification, or into the
* linear alloc area if not.
*/
const RegisterMap* registerMap;
/* set if method was called during method profiling */
bool inProfile;
};
结语
除了Java代码和NDK代码以外,其实还有一块比较重要,就是自动生产patch的工具,理解它需要对dex文件由比较深入的了解,而且阿里并没有直接开源该工具,而且这个工具已经有尽2年多没有更新过。
总之,对于AndFix的实现机制的研究网上还是比较多的,主要是因为该框架的原理比较直接粗暴,比较好理解。但其实从细节来看,如果自己开发这样的一个框架,需要对 dalvik 虚拟机, ART,Dex文件格式,JNI等知识都有一个比较全面而深入的了解才可能做出这样一个看似简单的解决方案,因此也说明了对于android底层的了解在很多情况下都是有比较大的帮助的,特别是在实现一些比较高级的功能时,例如热修复这种。这点还是比较值得学习的。
参考资料: