Utilize a parallel gzip implementation #25

USA-RedDragon · 2022-03-08T02:36:54Z

What is this?

This PR would allow parallel compression and improved decompression for slugs. I used a few cloned git repos from HashiCorp to test the performance difference for varying sizes of Git repositories. Mostly due to the fact that TFE utilizes this code and VCS repos are typically larger than 1MB.

This is based on some older worker here (https://github.com/hashicorp/go-service/pull/29), which should help with VCS ingress speeds, as it's still utilized by the slug ingress container, as per (unless I'm looking at old code): https://github.com/hashicorp/slug-ingress/blob/main/worker.go#L170-L180 (and the import therein). If these patches as well as #21 (great work!) were to be merged, significant speedups might be had in TFE/TFC/Agents.

This could considerably reduce the time taken for source code ingress on TFE, especially for customers with large monorepos.

The library that provides the parallel gzip drop-in replacement is https://github.com/klauspost/pgzip.

It is important to note that this library creates and reads standard gzip files. You do not have to match the compressor/decompressor to get the described speedups, and the gzip files are fully compatible with other gzip readers/writers.

As you can see from the results below, utilizing parallel gzip to compress these slugs can reduce the time it takes by up to 10x in some cases.

Decompression does have a speed-up as well, even though gzip decompression is single-threaded. Here's an except from the library's README explaining why it claims 104% decompression speedup over golang's implementation:

But wait, since gzip decompression is inherently singlethreaded (aside from CRC calculation) how can it be more than 100% faster? Because pgzip due to its design also acts as a buffer. When using unbuffered gzip, you are also waiting for io when you are decompressing. If the gzip decoder can keep up, it will always have data ready for your reader, and you will not be waiting for input to the gzip decompressor to complete.

Testing environment

HashiCorp-provided Lenovo X1 Carbon on Arch linux. 4c/8t i7-8665U.

File sizes via du -sh, I just pulled a few random repos from our GitHub plus a large Android source repository (platform_frameworks_base):, just to show the speedups in large workloads:

279M    atlas
437M    consul
580K    go-service
324K    is-immutable-aws-vault-consul
11G     platform_frameworks_base

Compression code

package main

import (
	"fmt"
	"log"
	"os"
	"time"

	"github.com/USA-RedDragon/go-slug"
)

func main() {
	if len(os.Args[1:]) != 2 {
		fmt.Printf("Usage: %v <input_dir> <output_file>\n", os.Args[0])
		os.Exit(1)
	}
	f, err := os.Create(os.Args[2])
	if err != nil {
		log.Fatal(err)
	}
	defer f.Sync()
	defer f.Close()

	// Then call the Pack function with a directory path containing the
	// configuration files and an io.Writer to write the slug to.
	defer duration(track("slug-pack"))
	if _, err := slug.Pack(os.Args[1], f, false); err != nil {
		log.Fatal(err)
	}
}

func track(msg string) (string, time.Time) {
	return msg, time.Now()
}

func duration(msg string, start time.Time) {
	fmt.Printf("%v: %v\n", msg, time.Since(start))
}

Decompression code

package main

import (
	"bufio"
	"fmt"
	"os"
	"time"

	"github.com/hashicorp/go-slug"
)

func main() {
	if len(os.Args[1:]) != 2 {
		fmt.Printf("Usage: %v <input_filer> <output_dir>\n", os.Args[0])
		os.Exit(1)
	}
	err := os.Mkdir(os.Args[2], 0755)
	if err != nil {
		fmt.Printf("Failed to create output directory: %s: %s\n", os.Args[2], err)
		os.Exit(1)
	}
	file, err := os.Open(os.Args[1])
	if err != nil {
		fmt.Printf("Failed to open input slug: %s: %s\n", os.Args[1], err)
		os.Exit(1)
	}
	reader := bufio.NewReader(file)
	defer duration(track("slug-unpack"))
	err = slug.Unpack(reader, os.Args[2])
	if err != nil {
		fmt.Printf("Failed to unpackage slug: %s", err)
		os.Exit(1)
	}
}

func track(msg string) (string, time.Time) {
	return msg, time.Now()
}

func duration(msg string, start time.Time) {
	fmt.Printf("%v: %v\n", msg, time.Since(start))
}

Compression Results

Atlas

Implementation	Time to compress - 5 runs	File size
golang gzip	1.3s 1.47s 1.39s 1.38s 1.26s	7.6M
pgzip	602.62ms 610.72ms 634.77ms 606.53ms 585.98ms	8M

Consul

Implementation	Time to compress - 5 runs	File size
golang gzip	2.99s, 2.53s, 2.64s, 2.97s 2.67s	23M
pgzip	525.3ms 499.16ms, 546.67ms 522.06ms 529.79ms	24M

go-service

Implementation	Time to compress - 5 runs	File size
golang gzip	11.65ms 10.16ms 9.6ms 9.67ms 11.57ms	35K
pgzip	9.52ms 8.13ms 7.77ms 11.35ms 8.66s	36K

is-immutable-aws-vault-consul

Here we see a great example of the <1MB of data being slightly slower. On average, about 1ms, or (1-(2.81/3.87)) ~27% slower.

Implementation	Time to compress - 5 runs	File size
golang gzip	3.92ms 2.54ms 2.51ms 2.47ms 2.61ms	437B
pgzip	4.43ms 5.57ms 3.12ms 2.96ms 3.26ms	441B

platform_frameworks_base

Here we see a great example of the speedups capable with this patch. This is an extreme, 11-gigabyte example, but it serves to show the scalability of this method.

Implementation	Time to compress - 5 runs	File size
golang gzip	1m1.47s 1m3.24s 58.23s 57.22s 55.81s	949M
pgzip	6.38s 5.94s 6.30s 7.4s 7.1s	971M

Decompression Results

Decompression is a slight improvement on average, not much to talk about.

Atlas

Implementation	Time to decompress - 5 runs
golang gzip	874.65ms 749.19ms 874.72ms 1.08s 779.53ms
pgzip	649.99ms 694.72ms 1.04s 539.59ms 610.86ms

Consul

Implementation	Time to decompress - 5 runs
golang gzip	1.47s 1.11s 1.52s 1.16s 1.21s
pgzip	507.07ms 799.67ms 680.70ms 873.84ms 731.31ms

go-service

Here we see the decompression speed gap close for such a small amount of data.

Implementation	Time to decompress - 5 runs
golang gzip	3.68ms 3.15ms 4.08ms 2.89ms 4.12ms
pgzip	3.31ms 4.62ms 4.14ms 4.49ms 2.84ms

is-immutable-aws-vault-consul

Here we see decompression is much slower on pgzip for such a small amount of data. This is likely due to the overhead of creating threads.

Implementation	Time to decompress - 5 runs
golang gzip	171ns 197ns 280ns 106ns 121ns
pgzip	762ns 455ns 639ns 551ns 700ns

platform_frameworks_base

Implementation	Time to decompress - 5 runs
golang gzip	18.52s 21.15s 22.92s 22.07s 23.05s
pgzip	13.33s 16.02s 18.21s 17.14s 14.69s

hashicorp-cla · 2022-03-12T17:28:54Z

All committers have signed the CLA.

brandonc · 2022-09-10T04:55:27Z

@USA-RedDragon I experimented with this version of go-slug in terraform with a working directory that contained about 800MB of uncompressable junk. Using just "terraform plan" with cloud config, my runs realized 20 to 40 seconds of speedup on the initial, local pack but this was a fraction of the total time it takes to upload the slug, dequeue the job, download it, execute terraform plan, pack it again, and upload the new slug (which took 2m40s in my baseline).

BUT, as you alluded to in the PR description, since there are two places we use go-slug in the pipeline (Preparing to plan + Waiting to apply) this optimization could potentially pay double and really move the needle on total planning time.

I also PR'd an alternative that allows a configurable compression level with compress/gzip. "BestSpeed" (level 1) achieves a 3x speedup but at the cost of ~10% compression size over the default (level 6). We could apply both optimizations, but if there were an apprehension to replace compress/gzip, that could be a viable alternative.

USA-RedDragon · 2022-09-10T21:12:06Z

Hey, I forgot about submitting this! 😄

Glad to see it get some activity. I'm not pushing for this change or anything, but it does seem to be some low hanging fruit.

brandonc · 2022-11-23T19:24:54Z

Even though this change makes go-slug compress slugs faster, I think there is an apprehension to replace compress/gzip with klauspost/pgzip because of the added dependency on something other than the standard lib. But, there is also an acknowledgement that the compression step is slow. Therefore, if we're stuck with single threaded execution, I've changed the default compression level to BestSpeed, which requires much less compute time (at the expense of a less favorable compression ratio)

I think this is a much better default for TFE that changes the impact of multithreaded compression and the calculation of whether or not to adopt it. We'll continue to monitor the performance of this step in the pipeline to see whether this was adequate. Thanks for your patience!

Utilize a parallel gzip implementation

63ce4b3

brandonc mentioned this pull request Sep 12, 2022

Adds compression level option to NewPacker #30

Closed

brandonc closed this Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilize a parallel gzip implementation #25

Utilize a parallel gzip implementation #25

USA-RedDragon commented Mar 8, 2022 •

edited

Loading

hashicorp-cla commented Mar 12, 2022 •

edited

Loading

brandonc commented Sep 10, 2022

USA-RedDragon commented Sep 10, 2022

brandonc commented Nov 23, 2022

Utilize a parallel gzip implementation #25

Utilize a parallel gzip implementation #25

Conversation

USA-RedDragon commented Mar 8, 2022 • edited Loading

What is this?

Testing environment

Compression code

Decompression code

Compression Results

Atlas

Consul

go-service

is-immutable-aws-vault-consul

platform_frameworks_base

Decompression Results

Atlas

Consul

go-service

is-immutable-aws-vault-consul

platform_frameworks_base

hashicorp-cla commented Mar 12, 2022 • edited Loading

brandonc commented Sep 10, 2022

USA-RedDragon commented Sep 10, 2022

brandonc commented Nov 23, 2022

USA-RedDragon commented Mar 8, 2022 •

edited

Loading

hashicorp-cla commented Mar 12, 2022 •

edited

Loading