问:使用内存文件映射 MappedByteBuffer 读超大文件会有什么问题吗?
答:这种方式存在一个致命问题就是依然没法读取超大文件(大于 Integer.MAX_VALUE
),因为 FileChannel 的 map 方法中 size 参数会有大小限制,源码中发现该参数值大于 Integer.MAX_VALUE
时会直接抛出 IllegalArgumentException("Size exceeds Integer.MAX_VALUE")
异常,所以对于特别大的文件其依然不适合。
本质上是由于 java.nio.MappedByteBuffer
直接继承自 java.nio.ByteBuffer
,而 ByteBuffer 的索引是 int 类型的,所以 MappedByteBuffer 也只能最大索引到 Integer.MAX_VALUE
的位置,所以 FileChannel 的 map 方法会做参数合法性检查。
我们可以通过多个内存文件映射来解决这个问题,具体如下。
class BigMappedByteBufferReader {
private MappedByteBuffer[] mappedByteBuffers;
private FileInputStream inputStream;
private FileChannel fileChannel;
private int bufferCountIndex = 0;
private int bufferCount;
private int byteBufferSize;
private byte[] byteBuffer;
public BigMappedByteBufferReader(String fileName, int byteBufferSize) throws IOException {
this.inputStream = new FileInputStream(fileName);
this.fileChannel = inputStream.getChannel();
long fileSize = fileChannel.size();
this.bufferCount = (int) Math.ceil((double) fileSize / (double) Integer.MAX_VALUE);
this.mappedByteBuffers = new MappedByteBuffer[bufferCount];
this.byteBufferSize = byteBufferSize;
long preLength = 0;
long regionSize = Integer.MAX_VALUE;
for (int i = 0; i < bufferCount; i++) {
if (fileSize - preLength < Integer.MAX_VALUE) {
regionSize = fileSize - preLength;
}
mappedByteBuffers[i] = fileChannel.map(FileChannel.MapMode.READ_ONLY, preLength, regionSize);
preLength += regionSize;
}
}
public synchronized int read() {
if (bufferCountIndex >= bufferCount) {
return -1;
}
int limit = mappedByteBuffers[bufferCountIndex].limit();
int position = mappedByteBuffers[bufferCountIndex].position();
int realSize = byteBufferSize;
if (limit - position < byteBufferSize) {
realSize = limit - position;
}
byteBuffer = new byte[realSize];
mappedByteBuffers[bufferCountIndex].get(byteBuffer);
//current fragment is end, goto next fragment start.
if (realSize < byteBufferSize && bufferCountIndex < bufferCount) {
bufferCountIndex++;
}
return realSize;
}
public void close() throws IOException {
fileChannel.close();
inputStream.close();
for (MappedByteBuffer byteBuffer: mappedByteBuffers) {
byteBuffer.clear();
}
byteBuffer = null;
}
public synchronized byte[] getCurrentBytes() {
return byteBuffer;
}
}
public class Test {
public static void main(String[] args) throws Exception {
BigMappedByteBufferReader reader = new BigMappedByteBufferReader("superbig.txt", 1024);
while (reader.read() != -1) {
byte[] bytes = reader.getCurrentBytes();
//超大文件搞事情
System.out.println(new String(bytes));
}
reader.close();
}
}
如上便是一种解决方案,其实质依然是分割。
本文参考自 内存文件映射方式读取超大文件踩坑题解析