我的项目须要解决上游180g的gz文件,读取文件内容过滤去重后仍以gz格局传给上游,上面是几种解决思路.
1.间接应用linux less命令绕过压缩/解压缩,实测效率极差,放弃.
2.Java api GZIPOutputStream,待测试,预计体现不会太好
3.单线程bash gzip/gunzip
gunzip耗时16分钟
gzip耗时18分钟
2022-08-01 13:21 started2022-08-01 13:55 finished
4.多线程 bash gzip/gunzip
io操作的瓶颈应该在磁盘,感觉多线程效率不会高,决定做个测试.(脱敏伪代码)
public class CopyFileMain { static Integer callShell(String command) { try { Process p = Runtime.getRuntime().exec(command); return p.waitFor(); } catch (Exception e) { // } return -1; } public static void main(String[] args) { ExecutorService executorService = Executors.newFixedThreadPool(3); int fileSize = 3; String command = "bash gunzip xx.gz"; List<Future<?>> list = new ArrayList<>(); for (int i = 0; i < fileSize; i++) { Future<?> result = executorService.submit(() -> callShell(command)); list.add(result); } while (true) { boolean finished = false; for (Future<?> future : list) { if(!future.isDone()){ finished=false; break; } finished=true; } if(finished){ System.out.println("finished"); break; } } } shutdown();}
gunzip耗时:13分钟
临时得出后果: 瓶颈在磁盘io