我的项目须要解决上游 180g 的 gz 文件, 读取文件内容过滤去重后仍以 gz 格局传给上游, 上面是几种解决思路.
1. 间接应用 linux less 命令绕过压缩 / 解压缩, 实测效率极差, 放弃.
2.Java api GZIPOutputStream, 待测试, 预计体现不会太好
3. 单线程 bash gzip/gunzip
gunzip 耗时 16 分钟
gzip 耗时 18 分钟
2022-08-01 13:21 started
2022-08-01 13:55 finished
4. 多线程 bash gzip/gunzip
io 操作的瓶颈应该在磁盘, 感觉多线程效率不会高, 决定做个测试.(脱敏伪代码)
public class CopyFileMain {static Integer callShell(String command) {
try {Process p = Runtime.getRuntime().exec(command);
return p.waitFor();} catch (Exception e) {//}
return -1;
}
public static void main(String[] args) {ExecutorService executorService = Executors.newFixedThreadPool(3);
int fileSize = 3;
String command = "bash gunzip xx.gz";
List<Future<?>> list = new ArrayList<>();
for (int i = 0; i < fileSize; i++) {Future<?> result = executorService.submit(() -> callShell(command));
list.add(result);
}
while (true) {
boolean finished = false;
for (Future<?> future : list) {if(!future.isDone()){
finished=false;
break;
}
finished=true;
}
if(finished){System.out.println("finished");
break;
}
}
}
shutdown();}
gunzip 耗时:13 分钟
临时得出后果: 瓶颈在磁盘 io