Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression ratio is different in ZSTD algorithm between ZstdOutputStream and ZstdCompressor.compress(Bytebuffer) #187

Closed
believezzd opened this issue Feb 21, 2024 · 2 comments

Comments

@believezzd
Copy link

believezzd commented Feb 21, 2024

Description

  • I have implemented two way to compress a large file(40M+)
  • One is using the ZstdCompressor.compress with ByteBuffer as args.
  • The other is using the ZstdOutputStream.
  • ZstdCompressor results in 12128084/40527865 compression ration
  • ZstdOutputStream results in 18109016/40527865 compression ration
  • The target file is 40527865 bytes
  • ZstdCompressor come with silimiar compression ratio to the zstd-jni in https://github.com/luben/zstd-jni
  • it must be something wrong that I can't figure out, so I ASK for HELP.

Aircompressior Version

<dependency>
      <groupId>io.airlift</groupId>
      <artifactId>aircompressor</artifactId>
      <version>0.26</version>
</dependency>

Code

ZstdCompressor.compress(Bytebuffer)

public static long compressFile(String inFileName, String outFileName) throws IOException {
    File inFile = new File(inFileName);
    File outFile = new File(outFileName);

    long numBytes = 0L;

    ByteBuffer inBuffer = ByteBuffer.allocateDirect(8*1024*1024); 
    ByteBuffer outBuffer = ByteBuffer.allocateDirect(8*1024*1024);
    try (RandomAccessFile inRaFile = new RandomAccessFile(inFile, "r"); 
        RandomAccessFile outRaFile = new RandomAccessFile(outFile, "rw");
        FileChannel inChannel = inRaFile.getChannel();
        FileChannel outChannel = outRaFile.getChannel()) {

        ZstdCompressor compressor = new ZstdCompressor();
        inBuffer.clear();
        while(inChannel.read(inBuffer) > 0) {
            inBuffer.flip();
            outBuffer.clear();

            compressor.compress(inBuffer, outBuffer);

            outBuffer.flip();
            outChannel.write(outBuffer);
            inBuffer.clear();
        }
    }

    return numBytes;
}

ZstdOutputStream

public static long compressFile(String inFileName, String outFileName) throws IOException {
    File inFile = new File(inFileName);
    File outFile = new File(outFileName);

    long numBytes = 0L;
    byte[] buffer = new byte[1024 * 1024 * 8];

    FileInputStream fi = null;
    FileOutputStream fo = null;

    try {
        fi = new FileInputStream(inFile);
        fo = new FileOutputStream(outFile);

        try (ZstdOutputStream zs = new ZstdOutputStream(fo)) {
            while (true) {
                int compressedSize = fi.read(buffer, 0, buffer.length);
                if (compressedSize == -1) {
                    break;
                }

                zs.write(buffer, 0, compressedSize);

                numBytes += compressedSize;
            }
        }
    } catch (Exception ex) {
        log.error("Error: ", ex);
    } finally {
        IOUtils.closeQuietly(fi);
        IOUtils.closeQuietly(fo);
    }

    return numBytes;
}

File to Compress

Computer

  • Intel Core i5
  • MacBook Pro
  • macOs Sonoma 14.0

JDK

  • 1.8.0_311
@believezzd
Copy link
Author

@martint

Could you give me a help.

@dain
Copy link
Member

dain commented Mar 18, 2024

The answer is they are very different compression techniques. ZstdCompressor is a block compressor which means it compresses a block of data in memory to an output buffer in memory in one shot. The requires the full input and output buffers to fit into memory. ZstdOutputStream is a stream compressor, which chops the imput data into chunks and uses the block compressor to compress the chunk. This means only part of the data needs to fit into memory at a time, but doesn't compress quite as well (it also adds extra data to the outptu describing the framing and such). BTW, what I am describing works for basically every compression algorithm.

@dain dain closed this as completed Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants