Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@embedFile corrupts compiler memory #22867

Open
RetroDev256 opened this issue Feb 12, 2025 · 3 comments
Open

@embedFile corrupts compiler memory #22867

RetroDev256 opened this issue Feb 12, 2025 · 3 comments
Labels
bug Observed behavior contradicts documented or intended behavior

Comments

@RetroDev256
Copy link
Contributor

RetroDev256 commented Feb 12, 2025

Zig Version

0.14.0-dev.3187+d4c85079c

Steps to Reproduce and Observed Behavior

First, write a program which uses @embedfile on something large.

pub fn main() !void {
    const massive_file = @embedFile("teehee.txt");
    try std.io.getStdOut().writeAll(massive_file);
}

const std = @import("std");

In this example, I just dumped 1 GiB into the file.

retrodev@lime ~ $ dd if=/dev/random of=teehee.txt bs=1G count=1
1+0 records inCallMainAndExit
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.60441 s, 412 MB/s

Now, try to compile it.
In most cases, you get a bunch of compile-errors about types not having certain members:
Image
Sometimes you get funny messages:

retrodev@lime ~ $ zig build-exe temp.zig 
Segmentation faultlysis

Expected Behavior

I expect the compiler to function as normal, and allow me to yeet whatever I want into my program.

@RetroDev256 RetroDev256 added the bug Observed behavior contradicts documented or intended behavior label Feb 12, 2025
@RetroDev256
Copy link
Contributor Author

RetroDev256 commented Feb 12, 2025

This may be a tad more broken than I first suspected:

retrodev@lime ~ $ l teehee.txt
35840 -rw-r--r-- 1 retrodev retrodev 36700160 Feb 11 23:19 teehee.txt
retrodev@lime ~ $ zig build-exe temp.zig 
repos/Zig/.zig/0.14.0-dev.3187+d4c85079c/files/lib/std/debug.zig:549:24: error: root source file struct 'fmt' has no member named ''
    const msg = std.fmt.bufPrint(buf[0..size], format, args) catch |err| switch (err) {
                ~~~~~~~^~~~~~~~~
repos/Zig/.zig/0.14.0-dev.3187+d4c85079c/files/lib/std/fmt.zig:1:1: note: struct declared here
//! String formatting and parsing.
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
repos/Zig/.zig/0.14.0-dev.3187+d4c85079c/files/lib/std/debug.zig:549:24: error: root source file struct 'fmt' has no member named ''
    const msg = std.fmt.bufPrint(buf[0..size], format, args) catch |err| switch (err) {
                ~~~~~~~^~~~~~~~~
repos/Zig/.zig/0.14.0-dev.3187+d4c85079c/files/lib/std/fmt.zig:1:1: note: struct declared here
//! String formatting and parsing.
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...

Yep, the issue reproduces even if you only put ~35 MiB into the file.
EDIT: while reproducing this reproduction, 35 MiB isn't a given, but it changes slightly. I can now repro this issue with 75 MiB, but not 35 MiB.

@RetroDev256
Copy link
Contributor Author

RetroDev256 commented Feb 12, 2025

Ok, got a trace. As I suspected, it has something to do with the InternPool:

Full trace
thread 10576 panic: reached unreachable code
Analyzing test.zig
      %5 = ret_type() 
      %6 = dbg_stmt(2, 5)
      %7 = str("big_file.txt")
      %8 = embed_file(%7) 
      %9 = validate_const(%8) 
      %10 = dbg_var_val(%8, "big_file")
      %11 = dbg_stmt(3, 5)
    > %12 = decl_ref("std") 
      %13 = dbg_stmt(3, 12)
      %14 = field_ptr(%12, "io") 
      %15 = dbg_stmt(3, 25)
      %16 = field_call(.auto, %14, "getStdOut", []) 
      %17 = ref(%16) 
      %18 = dbg_stmt(3, 36)
      %19 = field_call(.auto, %17, "writeAll", [
        {
          %20 = break_inline(%19, %8)
        },
      ]) 
      %21 = try(%19, {
        %22 = err_union_code(%19) 
        %23 = dbg_stmt(3, 5)
        %24 = ret_node(%22) 
      }) 
      %25 = ensure_result_used(%21) 
      %26 = restore_err_ret_index_unconditional(.none) 
      %27 = ret_implicit(@void_value) 
    For full context, use the command
      zig ast-check -t test.zig

  in /home/retrodev/repos/Zig/zig-master/lib/std/start.zig
    > %2067 = is_non_err(%2066) 
  in /home/retrodev/repos/Zig/zig-master/lib/std/start.zig
    > %2069 = block({%2064..%2068}) 
  in /home/retrodev/repos/Zig/zig-master/lib/std/start.zig
    > %2031 = switch_block(%2026,
        else => {%2049..%2153},
        @void_type => {%2032..%2040},
        @noreturn_type, @u8_type => {%2041..%2048}) 
  in /home/retrodev/repos/Zig/zig-master/lib/std/start.zig
    > %1827 = call(.auto, %1825, []) 
  in /home/retrodev/repos/Zig/zig-master/lib/std/start.zig
    > %1648 = call(.auto, %1646, [
        {%1649},
        {%1650},
        {%1651},
      ]) 
  in /home/retrodev/repos/Zig/zig-master/lib/std/start.zig
    > %1645 = field_call(nodiscard .auto, %1643, "exit", [
        {%1646..%1652},
      ]) 

/home/retrodev/repos/Zig/zig-master/lib/std/debug.zig:518:14: 0x1906d7d in assert (zig)
    if (!ok) unreachable; // assertion failure
             ^
/home/retrodev/repos/Zig/zig-master/src/InternPool.zig:1761:19: 0x1c6a6a0 in wrap (zig)
            assert(unwrapped.index <= ip.getIndexMask(u32));
                  ^
/home/retrodev/repos/Zig/zig-master/src/InternPool.zig:11594:88: 0x1af1aae in getOrPutTrailingString__anon_133609 (zig)
        @enumFromInt(@intFromEnum((String.Unwrapped{ .tid = tid, .index = start }).wrap(ip)));
                                                                                       ^
/home/retrodev/repos/Zig/zig-master/src/InternPool.zig:11546:37: 0x194350f in getOrPutString__anon_53763 (zig)
    return ip.getOrPutTrailingString(gpa, tid, @intCast(slice.len + 1), embedded_nulls);
                                    ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:6843:57: 0x2b9877c in zirDeclRef (zig)
    const decl_name = try zcu.intern_pool.getOrPutString(
                                                        ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1152:65: 0x271bd8b in analyzeBodyInner (zig)
            .decl_ref                     => try sema.zirDeclRef(block, inst),
                                                                ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1006:26: 0x2b21ce1 in analyzeFnBody (zig)
    sema.analyzeBodyInner(block, body) catch |err| switch (err) {
                         ^
/home/retrodev/repos/Zig/zig-master/src/Zcu/PerThread.zig:2702:23: 0x26b5a87 in analyzeFnBodyInner (zig)
    sema.analyzeFnBody(&inner_block, fn_info.body) catch |err| switch (err) {
                      ^
/home/retrodev/repos/Zig/zig-master/src/Zcu/PerThread.zig:1619:40: 0x22d7897 in analyzeFuncBody (zig)
    var air = try pt.analyzeFnBodyInner(func_index);
                                       ^
/home/retrodev/repos/Zig/zig-master/src/Zcu/PerThread.zig:1539:66: 0x1ef7ae6 in ensureFuncBodyUpToDate (zig)
    const ies_outdated, const new_failed = if (pt.analyzeFuncBody(func_index)) |result|
                                                                 ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:35895:38: 0x273b2ad in resolveInferredErrorSet (zig)
        try pt.ensureFuncBodyUpToDate(func_index);
                                     ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:32637:69: 0x2c47514 in analyzeIsNonErrComptimeOnly (zig)
                const resolved_ty = try sema.resolveInferredErrorSet(block, src, set_ty);
                                                                    ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:32666:56: 0x3082993 in analyzeIsNonErr (zig)
    const result = try sema.analyzeIsNonErrComptimeOnly(block, src, operand);
                                                       ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:19383:32: 0x2ba3880 in zirIsNonErr (zig)
    return sema.analyzeIsNonErr(block, src, operand);
                               ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1188:66: 0x271c5e1 in analyzeBodyInner (zig)
            .is_non_err                   => try sema.zirIsNonErr(block, inst),
                                                                 ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:6245:34: 0x30a7209 in resolveBlockBody (zig)
        if (sema.analyzeBodyInner(child_block, body)) |_| {
                                 ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:6222:33: 0x2c41945 in zirBlock (zig)
    return sema.resolveBlockBody(parent_block, src, &child_block, body, inst, &label.merges);
                                ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1677:37: 0x2727ca8 in analyzeBodyInner (zig)
            } else try sema.zirBlock(block, inst),
                                    ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:6245:34: 0x30a7209 in resolveBlockBody (zig)
        if (sema.analyzeBodyInner(child_block, body)) |_| {
                                 ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:10891:45: 0x309e571 in resolveProngComptime (zig)
                return sema.resolveBlockBody(spa.parent_block, src, child_block, prong_body, spa.switch_block_inst, merges);
                                            ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:13381:36: 0x30a9409 in resolveSwitchComptime (zig)
    return spa.resolveProngComptime(
                                   ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:13209:34: 0x309da83 in resolveSwitchComptimeLoop (zig)
        if (resolveSwitchComptime(
                                 ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:12416:49: 0x2bb4dda in zirSwitchBlock (zig)
                return resolveSwitchComptimeLoop(
                                                ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1212:69: 0x271cb7a in analyzeBodyInner (zig)
            .switch_block                 => try sema.zirSwitchBlock(block, inst, false),
                                                                    ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1006:26: 0x2b21ce1 in analyzeFnBody (zig)
    sema.analyzeBodyInner(block, body) catch |err| switch (err) {
                         ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:8238:27: 0x2c6200b in analyzeCall (zig)
        sema.analyzeFnBody(&child_block, fn_zir_info.body) catch |err| switch (err) {
                          ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:7214:43: 0x2b953a1 in zirCall__anon_476399 (zig)
    const call_inst = try sema.analyzeCall(block, func, func_ty, callee_src, call_src, modifier, ensure_result_used, args_info, call_dbg_node, .call);
                                          ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1144:62: 0x271bb28 in analyzeBodyInner (zig)
            .call                         => try sema.zirCall(block, inst, .direct),
                                                             ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1006:26: 0x2b21ce1 in analyzeFnBody (zig)
    sema.analyzeBodyInner(block, body) catch |err| switch (err) {
                         ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:8238:27: 0x2c6200b in analyzeCall (zig)
        sema.analyzeFnBody(&child_block, fn_zir_info.body) catch |err| switch (err) {
                          ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:7214:43: 0x2b953a1 in zirCall__anon_476399 (zig)
    const call_inst = try sema.analyzeCall(block, func, func_ty, callee_src, call_src, modifier, ensure_result_used, args_info, call_dbg_node, .call);
                                          ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1144:62: 0x271bb28 in analyzeBodyInner (zig)
            .call                         => try sema.zirCall(block, inst, .direct),
                                                             ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1024:30: 0x2306f6e in analyzeInlineBody (zig)
    if (sema.analyzeBodyInner(block, body)) |_| {
                             ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1057:39: 0x1f0a1ae in resolveInlineBody (zig)
    return (try sema.analyzeInlineBody(block, body, break_target)) orelse .unreachable_value;
                                      ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:7513:65: 0x3131582 in analyzeArg (zig)
                const uncoerced_arg = try sema.resolveInlineBody(block, arg_body, zir_call.call_inst);
                                                                ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:7753:41: 0x2c59a41 in analyzeCall (zig)
        arg.* = try args_info.analyzeArg(sema, block, arg_idx, param_ty, func_ty_info, callee, maybe_func_inst);
                                        ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:7214:43: 0x2b9651c in zirCall__anon_476401 (zig)
    const call_inst = try sema.analyzeCall(block, func, func_ty, callee_src, call_src, modifier, ensure_result_used, args_info, call_dbg_node, .call);
                                          ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1145:62: 0x271bb63 in analyzeBodyInner (zig)
            .field_call                   => try sema.zirCall(block, inst, .field),
                                                             ^
/home/retrodev/repos/Zig/zig-master/src/Sema.zig:1006:26: 0x2b21ce1 in analyzeFnBody (zig)
    sema.analyzeBodyInner(block, body) catch |err| switch (err) {
                         ^
/home/retrodev/repos/Zig/zig-master/src/Zcu/PerThread.zig:2702:23: 0x26b5a87 in analyzeFnBodyInner (zig)
    sema.analyzeFnBody(&inner_block, fn_info.body) catch |err| switch (err) {
                      ^
/home/retrodev/repos/Zig/zig-master/src/Zcu/PerThread.zig:1619:40: 0x22d7897 in analyzeFuncBody (zig)
    var air = try pt.analyzeFnBodyInner(func_index);
                                       ^
/home/retrodev/repos/Zig/zig-master/src/Zcu/PerThread.zig:1539:66: 0x1ef7ae6 in ensureFuncBodyUpToDate (zig)
    const ies_outdated, const new_failed = if (pt.analyzeFuncBody(func_index)) |result|
                                                                 ^
/home/retrodev/repos/Zig/zig-master/src/Compilation.zig:3962:38: 0x1cb3156 in processOneJob (zig)
            pt.ensureFuncBodyUpToDate(func) catch |err| switch (err) {
                                     ^
/home/retrodev/repos/Zig/zig-master/src/Compilation.zig:3897:30: 0x1b27f85 in performAllTheWorkInner (zig)
            try processOneJob(@intFromEnum(Zcu.PerThread.Id.main), comp, job);
                             ^
/home/retrodev/repos/Zig/zig-master/src/Compilation.zig:3645:36: 0x19b144c in performAllTheWork (zig)
    try comp.performAllTheWorkInner(main_progress_node);
                                   ^
/home/retrodev/repos/Zig/zig-master/src/Compilation.zig:2259:31: 0x19a81bc in update (zig)
    try comp.performAllTheWork(main_progress_node);
                              ^
/home/retrodev/repos/Zig/zig-master/src/main.zig:4503:20: 0x19e78a7 in updateModule (zig)
    try comp.update(prog_node);
                   ^
/home/retrodev/repos/Zig/zig-master/src/main.zig:3693:21: 0x1a54f39 in buildOutputType (zig)
        updateModule(comp, color, root_prog_node) catch |err| switch (err) {
                    ^
/home/retrodev/repos/Zig/zig-master/src/main.zig:274:31: 0x1909063 in mainArgs (zig)
        return buildOutputType(gpa, arena, args, .{ .build = .Exe });
                              ^
/home/retrodev/repos/Zig/zig-master/src/main.zig:215:20: 0x19062e3 in main (zig)
    return mainArgs(gpa, arena, args);
                   ^
/home/retrodev/repos/Zig/zig-master/lib/std/start.zig:656:37: 0x1905de7 in main (zig)
            const result = root.main() catch |err| {
                                    ^
???:?:?: 0x7f79904392ad in ??? (libc.so.6)
Unwind information for `libc.so.6:0x7f79904392ad` was not available, trace may be incomplete

Aborted

Glancing over the code, it seems that the string is simply "too big", and can't be stored in the internpool? AFAICT, the internpool index depends on the length of the string - and the maximum length is also dependant on ip.getIndexMask(u32); the length of all the strings in the internpool (including the sentinels) must be less than or equal to 2^32-1 (single-threaded build), or per thread, must be less than 1/32 of that (128 MiB). I guess this helps explain why I don't need very much data (in that one case, 35 MiB) to overflow the InternPool, but at the same time, very large projects should have a lot of data build up over time... So this does seem like an important issue that cripples large projects eventually.

Edit: the mask value/limit for the index is actually 67108863

@gabeuehlein
Copy link
Contributor

Edit: the mask value/limit for the index is actually 67108863

Small nitpick: the precise limit seems to be 232 - ip.tid_width - 1, where ip.tid_width can change based on the number of threads passed to InternPool.init. This can be observed by running zig build-exe a.zig -j<n>, with a.zig containing the following:

const std = @import("std");

pub fn main() !void {
    try std.io.getStdOut().writeAll("a" ** (1 << 28) ++ "a" ++ "b" ++ "c");
}

The above snippet crashes for n = 8 and succeeds for n = 1 and n = 2 (I didn't test any other values of n). This would mean that computers with lots of threads would run into this issue before computers with less threads (with 255 threads, I'm pretty sure the limit would be about 16MB of string bytes, which could certainly become an issue for projects with lots of strings). I'll add however that most projects don't have anywhere near enough strings to cause issues, even with a 16 MB limit. For example, strings -d $(which zig) | wc -c gives me about 1.5MB of string data, which includes a lot of junk that isn't actually interned as a string. I'd say that this isn't that big of an issue right now, but it should definitely be addressed later (i.e. before 1.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

2 participants