Skip to content

Commit

Permalink
[llvm-objcopy] Add --compress-sections
Browse files Browse the repository at this point in the history
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd

Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.

For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.

Note: Before this patch, a compressed relocation section is recognized
as a `RelocationSectionBase` as well and `removeSections` `!ToRemove(*ToRelSec)`
may incorrectly interpret a `CompressedSections` as `RelocationSectionBase`,
leading to ubsan failure for the new test. Fix this by setting
`OriginalFlags` in CompressedSection::CompressedSection.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: llvm#85036
  • Loading branch information
MaskRay committed Apr 15, 2024
1 parent 302d0f3 commit 0794298
Show file tree
Hide file tree
Showing 10 changed files with 283 additions and 8 deletions.
8 changes: 8 additions & 0 deletions llvm/docs/CommandGuide/llvm-objcopy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,14 @@ them.
Compress DWARF debug sections in the output, using the specified format.
Supported formats are ``zlib`` and ``zstd``. Use ``zlib`` if ``<format>`` is omitted.

.. option:: --compress-sections <section>=<format>

Compress or decompress sections matched by ``<section>`` using the specified
format. Supported formats are ``zlib`` and ``zstd``. Specify ``none`` for
decompression. When a section is matched by multiple options, the last one
wins. A wildcard ``<section>`` starting with '!' is disallowed.
Sections within a segment cannot be (de)compressed.

.. option:: --decompress-debug-sections

Decompress any compressed DWARF debug sections in the output.
Expand Down
4 changes: 4 additions & 0 deletions llvm/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,10 @@ Changes to the LLVM tools
for ELF input to skip the specified symbols when executing other options
that can change a symbol's name, binding or visibility.

* llvm-objcopy now supports ``--compress-sections`` to compress or decompress
arbitrary sections not within a segment.
(`#85036 <https://github.com/llvm/llvm-project/pull/85036>`_.)

* llvm-profgen now supports COFF+DWARF binaries. This enables Sample-based PGO
on Windows using Intel VTune's SEP. For details on usage, see the `end-user
documentation for SPGO
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/ObjCopy/CommonConfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,9 @@ struct CommonConfig {
bool DecompressDebugSections = false;

DebugCompressionType CompressionType = DebugCompressionType::None;

SmallVector<std::pair<NameMatcher, llvm::DebugCompressionType>, 0>
compressSections;
};

} // namespace objcopy
Expand Down
34 changes: 26 additions & 8 deletions llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -215,23 +215,41 @@ static Error dumpSectionToFile(StringRef SecName, StringRef Filename,
}

Error Object::compressOrDecompressSections(const CommonConfig &Config) {
// Build a list of the debug sections we are going to replace.
// We can't call `AddSection` while iterating over sections,
// Build a list of sections we are going to replace.
// We can't call `addSection` while iterating over sections,
// because it would mutate the sections array.
SmallVector<std::pair<SectionBase *, std::function<SectionBase *()>>, 0>
ToReplace;
for (SectionBase &Sec : sections()) {
if ((Sec.Flags & SHF_ALLOC) || !StringRef(Sec.Name).starts_with(".debug"))
std::optional<DebugCompressionType> CType;
for (auto &[Matcher, T] : Config.compressSections)
if (Matcher.matches(Sec.Name))
CType = T;
// Handle --compress-debug-sections and --decompress-debug-sections, which
// apply to non-ALLOC debug sections.
if (!(Sec.Flags & SHF_ALLOC) && StringRef(Sec.Name).starts_with(".debug")) {
if (Config.CompressionType != DebugCompressionType::None)
CType = Config.CompressionType;
else if (Config.DecompressDebugSections)
CType = DebugCompressionType::None;
}
if (!CType)
continue;

if (Sec.ParentSegment)
return createStringError(
errc::invalid_argument,
"section '" + Sec.Name +
"' within a segment cannot be (de)compressed");

if (auto *CS = dyn_cast<CompressedSection>(&Sec)) {
if (Config.DecompressDebugSections) {
if (*CType == DebugCompressionType::None)
ToReplace.emplace_back(
&Sec, [=] { return &addSection<DecompressedSection>(*CS); });
}
} else if (Config.CompressionType != DebugCompressionType::None) {
ToReplace.emplace_back(&Sec, [&, S = &Sec] {
} else if (*CType != DebugCompressionType::None) {
ToReplace.emplace_back(&Sec, [=, S = &Sec] {
return &addSection<CompressedSection>(
CompressedSection(*S, Config.CompressionType, Is64Bits));
CompressedSection(*S, *CType, Is64Bits));
});
}
}
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/ObjCopy/ELF/ELFObject.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -548,6 +548,7 @@ CompressedSection::CompressedSection(const SectionBase &Sec,
CompressedData);

Flags |= ELF::SHF_COMPRESSED;
OriginalFlags |= ELF::SHF_COMPRESSED;
size_t ChdrSize = Is64Bits ? sizeof(object::Elf_Chdr_Impl<object::ELF64LE>)
: sizeof(object::Elf_Chdr_Impl<object::ELF32LE>);
Size = ChdrSize + CompressedData.size();
Expand Down Expand Up @@ -2161,6 +2162,10 @@ Error Object::removeSections(
std::begin(Sections), std::end(Sections), [=](const SecPtr &Sec) {
if (ToRemove(*Sec))
return false;
// TODO: A compressed relocation section may be recognized as
// RelocationSectionBase. We don't want such a section to be removed.
if (isa<CompressedSection>(Sec))
return true;
if (auto RelSec = dyn_cast<RelocationSectionBase>(Sec.get())) {
if (auto ToRelSec = RelSec->getSection())
return !ToRemove(*ToRelSec);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Disallow (de)compression for sections within a segment as they are
## effectively immutable.
# RUN: rm -rf %t && mkdir %t && cd %t
# RUN: yaml2obj %s -o a
# RUN: not llvm-objcopy a /dev/null --compress-sections .text=zlib 2>&1 | FileCheck %s --implicit-check-not=error:

# CHECK: error: 'a': section '.text' within a segment cannot be (de)compressed

# RUN: not llvm-objcopy a /dev/null --compress-sections foo=none 2>&1 | FileCheck %s --check-prefix=CHECK2 --implicit-check-not=error:

# CHECK2: error: 'a': section 'foo' within a segment cannot be (de)compressed

## There is an error even if 'foo' is already compressed with zlib.
# RUN: not llvm-objcopy a /dev/null --compress-sections foo=zlib 2>&1 | FileCheck %s --check-prefix=CHECK3 --implicit-check-not=error:

# CHECK3: error: 'a': section 'foo' within a segment cannot be (de)compressed

--- !ELF
FileHeader:
Class: ELFCLASS64
Data: ELFDATA2LSB
Type: ET_EXEC
Machine: EM_X86_64
ProgramHeaders:
- Type: PT_LOAD
FirstSec: .text
LastSec: foo
Align: 0x1000
Offset: 0x1000
Sections:
- Name: .text
Type: SHT_PROGBITS
Offset: 0x1000
Content: C3
- Name: foo
Type: SHT_PROGBITS
Flags: [ SHF_COMPRESSED ]
Content: 010000000000000040000000000000000100000000000000789cd36280002d3269002f800151
128 changes: 128 additions & 0 deletions llvm/test/tools/llvm-objcopy/ELF/compress-sections.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# REQUIRES: x86-registered-target, zlib, zstd

# RUN: rm -rf %t && mkdir %t && cd %t
# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o a.o
## '*0=none' wins because it is the last. '*0' sections are decompressed (if originally compressed) or kept unchanged (if uncompressed).
## No section is named 'nomatch'. The third option is a no-op.
# RUN: llvm-objcopy a.o out --compress-sections='*0=zlib' --compress-sections '*0=none' --compress-sections 'nomatch=none' 2>&1 | count 0
# RUN: llvm-readelf -S out | FileCheck %s --check-prefix=CHECK1

# CHECK1: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK1: .text PROGBITS [[#%x,TEXT:]] [[#%x,]] [[#%x,]] 00 AX 0 0 4
# CHECK1: foo0 PROGBITS [[#%x,FOO0:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK1-NEXT: .relafoo0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 3 8
# CHECK1-NEXT: foo1 PROGBITS [[#%x,FOO1:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK1-NEXT: .relafoo1 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 5 8
# CHECK1: nonalloc0 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 8
# CHECK1-NEXT: .relanonalloc0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 7 8
# CHECK1-NEXT: nonalloc1 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 8
# CHECK1-NEXT: .debug_str PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MS 0 0 1

## Mixing zlib and zstd.
# RUN: llvm-objcopy a.o out2 --compress-sections '*c0=zlib' --compress-sections .debug_str=zstd
# RUN: llvm-readelf -Sr -x nonalloc0 -x .debug_str out2 2>&1 | FileCheck %s --check-prefix=CHECK2
# RUN: llvm-readelf -z -x nonalloc0 -x .debug_str out2 | FileCheck %s --check-prefix=CHECK2DE

# CHECK2: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK2: .text PROGBITS [[#%x,TEXT:]] [[#%x,]] [[#%x,]] 00 AX 0 0 4
# CHECK2: foo0 PROGBITS [[#%x,FOO0:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK2-NEXT: .relafoo0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 3 8
# CHECK2-NEXT: foo1 PROGBITS [[#%x,FOO1:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK2-NEXT: .relafoo1 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 5 8
# CHECK2: nonalloc0 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 C 0 0 8
# CHECK2-NEXT: .relanonalloc0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 IC 11 7 8
# CHECK2-NEXT: nonalloc1 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 8
# CHECK2-NEXT: .debug_str PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MSC 0 0 8

## llvm-readelf -r doesn't support SHF_COMPRESSED SHT_RELA.
# CHECK2: warning: {{.*}}: unable to read relocations from SHT_RELA section with index 8: section [index 8] has an invalid sh_size ([[#]]) which is not a multiple of its sh_entsize (24)

# CHECK2: Hex dump of section 'nonalloc0':
## zlib with ch_size=0x10
# CHECK2-NEXT: 01000000 00000000 10000000 00000000
# CHECK2-NEXT: 08000000 00000000 {{.*}}
# CHECK2: Hex dump of section '.debug_str':
## zstd with ch_size=0x38
# CHECK2-NEXT: 02000000 00000000 38000000 00000000
# CHECK2-NEXT: 01000000 00000000 {{.*}}

# CHECK2DE: Hex dump of section 'nonalloc0':
# CHECK2DE-NEXT: 0x00000000 00000000 00000000 00000000 00000000 ................
# CHECK2DE-EMPTY:
# CHECK2DE-NEXT: Hex dump of section '.debug_str':
# CHECK2DE-NEXT: 0x00000000 41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA

## --decompress-debug-sections takes precedence, even if it is before --compress-sections.
# RUN: llvm-objcopy a.o out3 --decompress-debug-sections --compress-sections .debug_str=zstd
# RUN: llvm-readelf -S out3 | FileCheck %s --check-prefix=CHECK3

# CHECK3: .debug_str PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MS 0 0 1

# RUN: llvm-objcopy a.o out4 --compress-sections '*0=zlib'
# RUN: llvm-readelf -S out4 | FileCheck %s --check-prefix=CHECK4

# CHECK4: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK4: .text PROGBITS [[#%x,TEXT:]] [[#%x,]] [[#%x,]] 00 AX 0 0 4
# CHECK4: foo0 PROGBITS [[#%x,FOO0:]] [[#%x,]] [[#%x,]] 00 AC 0 0 8
# CHECK4-NEXT: .relafoo0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 IC 11 3 8
# CHECK4-NEXT: foo1 PROGBITS [[#%x,FOO1:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK4-NEXT: .relafoo1 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 5 8
# CHECK4: nonalloc0 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 C 0 0 8
# CHECK4-NEXT: .relanonalloc0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 IC 11 7 8
# CHECK4-NEXT: nonalloc1 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 8
# CHECK4-NEXT: .debug_str PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MS 0 0 1

## If a section is already compressed, compression request for another format is ignored.
# RUN: llvm-objcopy a.o out5 --compress-sections 'nonalloc0=zlib'
# RUN: llvm-readelf -x nonalloc0 out5 | FileCheck %s --check-prefix=CHECK5
# RUN: llvm-objcopy out5 out5a --compress-sections 'nonalloc0=zstd'
# RUN: cmp out5 out5a

# CHECK5: Hex dump of section 'nonalloc0':
## zlib with ch_size=0x10
# CHECK5-NEXT: 01000000 00000000 10000000 00000000
# CHECK5-NEXT: 08000000 00000000 {{.*}}

# RUN: not llvm-objcopy --compress-sections=foo a.o out 2>&1 | \
# RUN: FileCheck %s --check-prefix=ERR1 --implicit-check-not=error:
# ERR1: error: --compress-sections: parse error, not 'section-glob=[none|zlib|zstd]'

# RUN: llvm-objcopy --compress-sections 'a[=zlib' a.o out 2>&1 | \
# RUN: FileCheck %s --check-prefix=ERR2 --implicit-check-not=error:
# ERR2: warning: invalid glob pattern, unmatched '['

# RUN: not llvm-objcopy a.o out --compress-sections='.debug*=zlib-gabi' --compress-sections='.debug*=' 2>&1 | \
# RUN: FileCheck -check-prefix=ERR3 %s
# ERR3: error: invalid or unsupported --compress-sections format: .debug*=zlib-gabi

# RUN: not llvm-objcopy a.o out --compress-sections='!.debug*=zlib' 2>&1 | \
# RUN: FileCheck -check-prefix=ERR4 %s
# ERR4: error: --compress-sections: negative pattern is unsupported

.globl _start
_start:
ret

.section foo0,"a"
.balign 8
.quad .text-.
.quad .text-.
.section foo1,"a"
.balign 8
.quad .text-.
.quad .text-.
.section nonalloc0,""
.balign 8
.quad .text+1
.quad .text+2
sym0:
.section nonalloc1,""
.balign 8
.quad 42
sym1:

.section .debug_str,"MS",@progbits,1
.Linfo_string0:
.asciz "AAAAAAAAAAAAAAAAAAAAAAAAAAA"
.Linfo_string1:
.asciz "BBBBBBBBBBBBBBBBBBBBBBBBBBB"
29 changes: 29 additions & 0 deletions llvm/test/tools/llvm-objcopy/ELF/decompress-sections.test
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,42 @@
# RUN: yaml2obj %s -o %t
# RUN: llvm-objcopy --decompress-debug-sections %t %t.de
# RUN: llvm-readelf -S %t.de | FileCheck %s
# RUN: llvm-objcopy --compress-sections '*nonalloc=none' --compress-sections .debugx=none %t %t.1.de
# RUN: cmp %t.de %t.1.de

# CHECK: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK: .debug_alloc PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 AC 0 0 0
# CHECK-NEXT: .debug_nonalloc PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 1
# CHECK-NEXT: .debugx PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 1
# CHECK-NEXT: nodebug PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 C 0 0 0

# RUN: llvm-objcopy --compress-sections '.debug*=none' %t %t2.de
# RUN: llvm-readelf -S -x .debug_alloc -x .debug_nonalloc -x .debugx %t2.de | FileCheck %s --check-prefix=CHECK2

# CHECK2: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK2: .debug_alloc PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 A 0 0 1
# CHECK2-NEXT: .debug_nonalloc PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 1
# CHECK2-NEXT: .debugx PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 1
# CHECK2-NEXT: nodebug PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 C 0 0 0

# CHECK2: Hex dump of section '.debug_alloc':
# CHECK2-NEXT: 0x00000000 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000010 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000020 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000030 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-EMPTY:
# CHECK2: Hex dump of section '.debug_nonalloc':
# CHECK2-NEXT: 0x00000000 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000010 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000020 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000030 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-EMPTY:
# CHECK2-NEXT: Hex dump of section '.debugx':
# CHECK2-NEXT: 0x00000000 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000010 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000020 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000030 2a000000 00000000 2a000000 00000000 *.......*.......

--- !ELF
FileHeader:
Class: ELFCLASS64
Expand Down
36 changes: 36 additions & 0 deletions llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -736,6 +736,42 @@ objcopy::parseObjcopyOptions(ArrayRef<const char *> RawArgsArr,
return createStringError(errc::invalid_argument, Reason);
}

for (const auto *A : InputArgs.filtered(OBJCOPY_compress_sections)) {
SmallVector<StringRef, 0> Fields;
StringRef(A->getValue()).split(Fields, '=');
if (Fields.size() != 2 || Fields[1].empty()) {
return createStringError(
errc::invalid_argument,
A->getSpelling() +
": parse error, not 'section-glob=[none|zlib|zstd]'");
}

auto Type = StringSwitch<DebugCompressionType>(Fields[1])
.Case("zlib", DebugCompressionType::Zlib)
.Case("zstd", DebugCompressionType::Zstd)
.Default(DebugCompressionType::None);
if (Type == DebugCompressionType::None && Fields[1] != "none") {
return createStringError(
errc::invalid_argument,
"invalid or unsupported --compress-sections format: %s",
A->getValue());
}

auto &P = Config.compressSections.emplace_back();
P.second = Type;
auto Matcher =
NameOrPattern::create(Fields[0], SectionMatchStyle, ErrorCallback);
// =none allows overriding a previous =zlib or =zstd. Reject negative
// patterns, which would be confusing.
if (Matcher && !Matcher->isPositiveMatch()) {
return createStringError(
errc::invalid_argument,
"--compress-sections: negative pattern is unsupported");
}
if (Error E = P.first.addMatcher(std::move(Matcher)))
return std::move(E);
}

Config.AddGnuDebugLink = InputArgs.getLastArgValue(OBJCOPY_add_gnu_debuglink);
// The gnu_debuglink's target is expected to not change or else its CRC would
// become invalidated and get rejected. We can avoid recalculating the
Expand Down
6 changes: 6 additions & 0 deletions llvm/tools/llvm-objcopy/ObjcopyOpts.td
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ def : Flag<["--"], "compress-debug-sections">, Alias<compress_debug_sections>,
AliasArgs<["zlib"]>;
def decompress_debug_sections : Flag<["--"], "decompress-debug-sections">,
HelpText<"Decompress DWARF debug sections">;
defm compress_sections
: Eq<"compress-sections",
"Compress or decompress sections using specified format. Supported "
"formats: zlib, zstd. Specify 'none' for decompression">,
MetaVarName<"<section-glob>=<format>">;

defm split_dwo
: Eq<"split-dwo", "Equivalent to --extract-dwo and <dwo-file> as the output file and no other options, "
"and then --strip-dwo on the input file">,
Expand Down

0 comments on commit 0794298

Please sign in to comment.