遇到了 Netty ByteBufUtil.indexOf 的一个小 BUG

0. 问题

我在解析 Redis Simple Strings 和 Errors 时用到了 Netty 的一个工具类 io.netty.buffer.ByteBufUtil 里的 indexOf(ByteBuf needle, ByteBuf haystack) 方法。也忘记了是如何找到了这个方法，但是一直这么用着。

最近升级了一下 Netty 版本（ 4.1.45.Final --> 4.1.72.Final），结果解析 +PONG\r\n 就出错了，原来能正确解析出 PONG，现在解析的结果是 PON。

简化的处理逻辑如下所示：

ByteBuf in = Unpooled.copiedBuffer("+PONG\r\n", CharsetUtil.UTF_8);
ByteBuf SEP_CRLF = Unpooled.copiedBuffer("\r\n", CharsetUtil.UTF_8);
in.readByte();

int index = ByteBufUtil.indexOf(SEP_CRLF, in);

System.out.printf("expect 5, actual %d%n", index);

4.1.72.Final 版本的执行结果是 expect 5, actual 4。

感觉上新版本返回的“相对 index”而不是之前的“绝对 index”。

1. ByteBufUtil.indexOf(ByteBuf needle, ByteBuf haystack) 的使用

通过 IDEA 的 Find Usages 工具发现这个这个方法只在 io.netty.handler.codec.http2.Http2ConnectionHandler 使用：

            // If the input so far doesn't match the preface, break the connection.
            if (bytesRead == 0 || !ByteBufUtil.equals(in, in.readerIndex(),
                                                      clientPrefaceString, clientPrefaceString.readerIndex(),
                                                      bytesRead)) {
                int maxSearch = 1024; // picked because 512 is too little, and 2048 too much
                int http1Index =
                    ByteBufUtil.indexOf(HTTP_1_X_BUF, in.slice(in.readerIndex(), min(in.readableBytes(), maxSearch)));
                if (http1Index != -1) {
                    String chunk = in.toString(in.readerIndex(), http1Index - in.readerIndex(), CharsetUtil.US_ASCII);
                    throw connectionError(PROTOCOL_ERROR, "Unexpected HTTP/1.x request: %s", chunk);
                }
                String receivedBytes = hexDump(in, in.readerIndex(),
                                               min(in.readableBytes(), clientPrefaceString.readableBytes()));
                throw connectionError(PROTOCOL_ERROR, "HTTP/2 client preface string missing or corrupt. " +
                                                      "Hex dump for received bytes: %s", receivedBytes);
            }

在一个异常的分支，通过 indexOf 搜索 HTTP_1_X_BUF 关键字，判断该种特定的错误类型。

在此处， haystack 参数的 readerIndex 肯定是 0，所以不会触发我遇到的问题。

2. ByteBufUtil.indexOf(ByteBuf needle, ByteBuf haystack) 的单元测试

又看了一下单元测试：

        final ByteBuf needle = Unpooled.copiedBuffer("abc12", CharsetUtil.UTF_8);
        haystack.readerIndex(1);
        needle.readerIndex(1);
        assertEquals(0, ByteBufUtil.indexOf(needle, haystack));
        haystack.readerIndex(2);
        needle.readerIndex(3);
        assertEquals(1, ByteBufUtil.indexOf(needle, haystack));
        haystack.readerIndex(1);
        needle.readerIndex(2);
        assertEquals(1, ByteBufUtil.indexOf(needle, haystack));
        haystack.release();

从这部分来看，indexOf 的预期返回值就是“相对 index”。

这就有点儿绕晕了，那在 4.1.45.Final 这个单元测试是如何 pass 的？肯定报错呀。

翻了一下 commits 记录，发现该方法之前就没有单元测试，现在我看到的单元测试就是在代码优化时添加的：

3. ByteBufUtil.indexOf(ByteBuf needle, ByteBuf haystack) 的实现

看了一下现在的实现，说实话有点儿拉低 Netty 的代码质量，也不知道这个 pr 是如何 merge 进来的。

/**
 * Returns the reader index of needle in haystack, or -1 if needle is not in haystack.
 * This method uses the <a href="https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm">Two-Way
 * string matching algorithm</a>, which yields O(1) space complexity and excellent performance.
 */
public static int indexOf(ByteBuf needle, ByteBuf haystack) {
    if (haystack == null || needle == null) {
        return -1;
    }

    if (needle.readableBytes() > haystack.readableBytes()) {
        return -1;
    }

    int n = haystack.readableBytes();
    int m = needle.readableBytes();
    if (m == 0) {
        return 0;
    }

    // When the needle has only one byte that can be read,
    // the firstIndexOf method needs to be called
    if (m == 1) {
        return firstIndexOf((AbstractByteBuf) haystack, haystack.readerIndex(),
                haystack.writerIndex(), needle.getByte(needle.readerIndex()));
    }

    int i;
    int j = 0;
    int aStartIndex = needle.readerIndex();
    int bStartIndex = haystack.readerIndex();
    long suffixes =  maxSuf(needle, m, aStartIndex, true);
    long prefixes = maxSuf(needle, m, aStartIndex, false);
    int ell = Math.max((int) (suffixes >> 32), (int) (prefixes >> 32));
    int per = Math.max((int) suffixes, (int) prefixes);
    int memory;
    int length = Math.min(m - per, ell + 1);

    if (equals(needle, aStartIndex, needle, aStartIndex + per,  length)) {
        memory = -1;
        while (j <= n - m) {
            i = Math.max(ell, memory) + 1;
            while (i < m && needle.getByte(i + aStartIndex) == haystack.getByte(i + j + bStartIndex)) {
                ++i;
            }
            if (i > n) {
                return -1;
            }
            if (i >= m) {
                i = ell;
                while (i > memory && needle.getByte(i + aStartIndex) == haystack.getByte(i + j + bStartIndex)) {
                    --i;
                }
                if (i <= memory) {
                    return j;
                }
                j += per;
                memory = m - per - 1;
            } else {
                j += i - ell;
                memory = -1;
            }
        }
    } else {
        per = Math.max(ell + 1, m - ell - 1) + 1;
        while (j <= n - m) {
            i = ell + 1;
            while (i < m && needle.getByte(i + aStartIndex) == haystack.getByte(i + j + bStartIndex)) {
                ++i;
            }
            if (i > n) {
                return -1;
            }
            if (i >= m) {
                i = ell;
                while (i >= 0 && needle.getByte(i + aStartIndex) == haystack.getByte(i + j + bStartIndex)) {
                    --i;
                }
                if (i < 0) {
                    return j;
                }
                j += per;
            } else {
                j += i - ell;
            }
        }
    }
    return -1;
}

private static long maxSuf(ByteBuf x, int m, int start, boolean isSuffix) {
    int p = 1;
    int ms = -1;
    int j = start;
    int k = 1;
    byte a;
    byte b;
    while (j + k < m) {
        a = x.getByte(j + k);
        b = x.getByte(ms + k);
        boolean suffix = isSuffix ? a < b : a > b;
        if (suffix) {
            j += k;
            k = 1;
            p = j - ms;
        } else if (a == b) {
            if (k != p) {
                ++k;
            } else {
                j += p;
                k = 1;
            }
        } else {
            ms = j;
            j = ms + 1;
            k = p = 1;
        }
    }
    return ((long) ms << 32) + p;
}

大部分情况确实返回的是“相对 index”，但是在 needle 长度为 1 的情况下，调用的是 ByteBuf 的 indexOf(int fromIndex, int toIndex, byte value) 方法，返回的是“绝对 index”。

再看之前 4.1.45.Final 版本的实现，是一个很简单的暴力搜索算法：

/**
 * Returns the reader index of needle in haystack, or -1 if needle is not in haystack.
 */
public static int indexOf(ByteBuf needle, ByteBuf haystack) {
    // TODO: maybe use Boyer Moore for efficiency.
    int attempts = haystack.readableBytes() - needle.readableBytes() + 1;
    for (int i = 0; i < attempts; i++) {
        if (equals(needle, needle.readerIndex(),
                   haystack, haystack.readerIndex() + i,
                   needle.readableBytes())) {
            return haystack.readerIndex() + i;
        }
    }
    return -1;
}

返回的是“绝对 index”。

给 Netty 提了一个 issue，但是还没有回应。。。

4. 最后

是不是优秀的 Java 开发者都转别的语言了？以小见大，Netty 的代码质量真的是在下滑。
对一个开源项目来说，单元测试真是太重要了。

不过新的 Two-Way string matching algorithm 看起来还挺有趣，看 wikipedia 上的介绍，glibc 的 strstr 函数也是基于该算法实现的，并且还有 SSE2 硬件优化实现，接下来深入学习一下。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 206,013评论 6赞 481
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 88,205评论 2赞 382
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 152,370评论 0赞 342
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 55,168评论 1赞 278
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 64,153评论 5赞 371
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,954评论 1赞 283
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,271评论 3赞 399
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,916评论 0赞 259
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,382评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,877评论 2赞 323
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,989评论 1赞 333
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,624评论 4赞 322
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,209评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,199评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,418评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,401评论 2赞 352
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,700评论 2赞 345