我的项目须要解决上游180g的gz文件,读取文件内容过滤去重后仍以gz格局传给上游,上面是几种解决思路.

1.间接应用linux less命令绕过压缩/解压缩,实测效率极差,放弃.

2.Java api GZIPOutputStream,待测试,预计体现不会太好

3.单线程bash gzip/gunzip
gunzip耗时16分钟
gzip耗时18分钟

2022-08-01 13:21  started2022-08-01 13:55 finished

4.多线程 bash gzip/gunzip
io操作的瓶颈应该在磁盘,感觉多线程效率不会高,决定做个测试.(脱敏伪代码)

public class CopyFileMain {    static Integer callShell(String command) {        try {            Process p = Runtime.getRuntime().exec(command);            return p.waitFor();        } catch (Exception e) {            //        }        return -1;    }    public static void main(String[] args) {        ExecutorService executorService = Executors.newFixedThreadPool(3);        int fileSize = 3;        String command = "bash gunzip xx.gz";        List<Future<?>> list = new ArrayList<>();        for (int i = 0; i < fileSize; i++) {            Future<?> result = executorService.submit(() -> callShell(command));            list.add(result);        }        while (true) {            boolean finished = false;            for (Future<?> future : list) {                if(!future.isDone()){                    finished=false;                    break;                }                finished=true;            }            if(finished){                System.out.println("finished");                break;            }        }    }        shutdown();}

gunzip耗时:13分钟
临时得出后果: 瓶颈在磁盘io