Hash (MD2) elements of a List from a large file (~5GB), using ExecutorService, invokeAll extremely slow

Issue

This Content is from Stack Overflow. Question asked by Thend

I’m hashing (with MD2) a large file parallel using ExecutorService. The process is the following:
First, the “inputparser3” method decides if the input is a text or a file, if it is a file, and larger than 100MB, splits the file into chunks and store them in a List of byte[], every chunk is 10MB. Second, the “encrypt3” method hashes every chunk parallel using ExecutorService, for example using 6 threads. After, I use the futures list (with the hashes of the small chunks) to generate a master hash (root hash).

For the example, I download a test file (5GB) from this website: https://testfiledownload.com/
When I tested my code I recognized that it takes forever to finish… about 17 minutes (1000ms), without ExecutorServices and splitting it into chunks it takes only 6 minutes (400ms) on average.
I recognized that the threadPool.invokeAll(callableTasks) takes most of the time, and I believe because it has to execute 500 chunks on 6 thread.

My questions are the following:
-Is it possible to run 6 chunks at a time, I mean when the first 6 finishes start the second 6…and so on…
-Is it possible to improve somehow?
-Should I completely remove the parallelization?

My code (in main method):

String largefile = "C:\Users\Thend\Desktop\test_5gb\5gb.test";

StopWatch st = new StopWatch();
st.start();
String hash = encrypt3(Objects.requireNonNull(inputparser3(largefile, "FILE")));
st.stop();
System.out.println("Hash: "+hash+" | Execution time: "+String.format("%.5f", st.getTotalTimeMillis() / 1000.0f)+"sec");

The methods:

public static List<byte[]> inputparser3(String PathOrText, String TYPE) throws IOException {
    if (TYPE.equals("TXT")) {

        List<byte[]> temp = new ArrayList<>();
        if(PathOrText.getBytes(StandardCharsets.UTF_8).length > 10485760){ // 10 MB = 10485760 Bytes (in binary)

            // Add chunk by chunk
            ByteArrayInputStream in = new ByteArrayInputStream(PathOrText.getBytes(StandardCharsets.UTF_8));
            byte[] buffer = new byte[1048576]; // 1 MB chunk
            int len;
            while ((len = in.read(buffer)) > 0) {
                temp.add(buffer);
            }
            return temp;

        } else {

            // Add whole
            temp.add(PathOrText.getBytes(StandardCharsets.UTF_8));
            return temp;

        }
    } else if (TYPE.equals("FILE")) {

        List<byte[]> temp = new ArrayList<>();
        if(Files.size(Path.of(PathOrText)) > 104857600) { // 100 MB = 104857600 Bytes (in binary)
            // Add chunk by chunk
            try (FileInputStream fis = new FileInputStream(PathOrText)) {
                byte[] buffer = new byte[10485760]; //  10 MB chunk
                int len;
                while ((len = fis.read(buffer)) > 0) {
                    temp.add(buffer);
                }
                return temp;
            }
        } else {
            // Add whole
            try (FileInputStream fis = new FileInputStream(PathOrText)) {
                byte[] buffer = new byte[(int) PathOrText.length()]; //  Add whole
                int len;
                while ((len = fis.read(buffer)) > 0) {
                    temp.add(buffer);
                }
                return temp;
            }
        }
    }
    System.err.println("Input type cannot be recognized! Use 'TXT' for text or 'FILE' for file.");
    return null;
}
public static String encrypt3(List<byte[]> list) throws ExecutionException, InterruptedException, NoSuchAlgorithmException {

    ExecutorService threadPool = Executors.newFixedThreadPool(6);
    List<Callable<String>> callableTasks = new ArrayList<>();
    MessageDigest md = MessageDigest.getInstance("MD5");
    StringBuilder sb = new StringBuilder();

    ArrayList<String> temp = new ArrayList<>();

    // If there is MORE THAN ONE element in the list
    if(list.size() > 1){

        list.forEach((n)->{
            Callable<String> callableTask = () -> {

                sb.setLength(0);
                md.reset();

                md.update(n);
                byte[] hashed_bytes = md.digest();

                for (byte hashed_byte : hashed_bytes) {sb.append(Integer.toString((hashed_byte & 0xff) + 0x100, 16).substring(1));}
                return sb.toString();

            };
            callableTasks.add(callableTask);
        });

        List<Future<String>> futures = threadPool.invokeAll(callableTasks);
        threadPool.shutdown();

        for (Future<String> future : futures) {
            temp.add(future.get());
        }

        // Create a master hash (root hash)
        sb.setLength(0);
        md.reset();

        md.update(String.join("", temp).getBytes());
        byte[] hashed_bytes = md.digest();
        for (byte hashed_byte : hashed_bytes) {sb.append(Integer.toString((hashed_byte & 0xff) + 0x100, 16).substring(1));}
        return sb.toString();

        // If there is ONLY ONE element in the list
    } else {

        sb.setLength(0);
        md.reset();

        md.update(list.get(0));
        byte[] hashed_bytes = md.digest();
        for (byte hashed_byte : hashed_bytes) {sb.append(Integer.toString((hashed_byte & 0xff) + 0x100, 16).substring(1));}
        return sb.toString();
    }
}



Solution

This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?