Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption when string.char(0) used as a key #2

Open
rohitjoshi opened this issue Aug 11, 2016 · 9 comments
Open

Memory corruption when string.char(0) used as a key #2

rohitjoshi opened this issue Aug 11, 2016 · 9 comments

Comments

@rohitjoshi
Copy link

We have been seeing occasional memory corruption when string.char(0) used as part of the key. We can reproduce core only on one machine. When replaced it with '_' , not able to reproduce core.

`
local str_null = string.char(0)

    local servers, nodes = {}, {}
    for serv, weight in pairs(server_list) do
        local id = string.gsub(serv, ":", str_null)

        servers[id] = serv
        nodes[id] = weight
    end

`

@doujiang24
Copy link
Member

@rohitjoshi Thanks for you feedback :)
Can you provide the core backtrace?
Please give me an minimal example, may be I can reproduce on my side, also the Openresty version & Lua version, thanks :)

ps: replace : with str_null only needed when we want same hash result compared with the default nginx chash :)

@rohitjoshi
Copy link
Author

@doujiang24 thanks for looking at this issue. yes, we have replaced str_null with "_" and working fine. just wanted to report this. We are able to reproduce only on one mac environment.

Here is one of the stack trace but it is random and seeing different stack trace every time.
`

  • thread Weight vs hash #1: tid = 0x0000, 0x00007fff89490f06 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
    • frame #0: 0x00007fff89490f06 libsystem_kernel.dylib__pthread_kill + 10 frame #1: 0x00007fff841fd4ec libsystem_pthread.dylibpthread_kill + 90
      frame Memory corruption when string.char(0) used as a key #2: 0x00007fff971906e7 libsystem_c.dylibabort + 129 frame #3: 0x00007fff9348e396 libsystem_malloc.dylibszone_error + 626
      frame How to use the resty.chash #4: 0x00007fff934840e9 libsystem_malloc.dylibsmall_malloc_from_free_list + 1093 frame #5: 0x00007fff93480b64 libsystem_malloc.dylibszone_malloc_should_clear + 1411
      frame Question: How to configure backup node #6: 0x00007fff93486a52 libsystem_malloc.dylibmalloc_zone_memalign + 114 frame #7: 0x00007fff93488417 libsystem_malloc.dylibposix_memalign + 55
      frame module 'ngx.balancer' not found ? #8: 0x0000000106ea7fc5 nginxngx_memalign(alignment=16, size=4096, log=0x00007fdec3489030) + 37 at ngx_alloc.c:57 frame #9: 0x0000000106e870dc nginxngx_palloc [inlined] ngx_palloc_block(pool=0x00007fdec4842c00, size=) + 24 at ngx_palloc.c:189
      frame make: unrecognized command line option "-flto" #10: 0x0000000106e870c4 nginxngx_palloc [inlined] ngx_palloc_small(pool=0x00007fdec4842c00, size=80, align=1) + 47 at ngx_palloc.c:176 frame #11: 0x0000000106e87095 nginxngx_palloc(pool=0x00007fdec4842c00, size=80) + 293 at ngx_palloc.c:130
      frame [Advise]the improvement on roundrobin. #12: 0x0000000106e8758f nginxngx_pcalloc(pool=<unavailable>, size=80) + 15 at ngx_palloc.c:305 frame #13: 0x0000000106e896ca nginxngx_chain_get_free_buf(p=0x00007fdec4842c00, free=) + 90 at ngx_buf.c:172
      frame Rename libchash to librestychash #14: 0x0000000106ee1bee nginxngx_http_chunked_body_filter(r=0x00007fdec4842c50, in=<unavailable>) + 302 at ngx_http_chunked_filter_module.c:151 frame #15: 0x0000000106ee3369 nginxngx_http_gzip_body_filter(r=0x00007fdec4842c50, in=0x00007fdec4816650) + 217 at ngx_http_gzip_filter_module.c:326
      frame doc: rename libchash.so to librestychash.so #16: 0x0000000106ee6906 nginxngx_http_ssi_body_filter(r=0x00007fdec4842c50, in=<unavailable>) + 326 at ngx_http_ssi_filter_module.c:411 frame #17: 0x0000000106ee9524 nginxngx_http_charset_body_filter(r=0x00007fdec4842c50, in=) + 260 at ngx_http_charset_filter_module.c:647
      frame the consistent_hash method is the same with the tengine #18: 0x0000000106f38353 nginxngx_http_lua_capture_body_filter(r=<unavailable>, in=<unavailable>) + 131 at ngx_http_lua_capturefilter.c:125 frame #19: 0x0000000106f47b68 nginxngx_http_lua_body_filter(r=0x00007fdec4842c50, in=) + 600 at ngx_http_lua_bodyfilterby.c:319
      frame Does health check take effect with balancer by lua? #20: 0x0000000106e89aca nginxngx_output_chain(ctx=<unavailable>, in=0x00007fff58d7c120) + 186 at ngx_output_chain.c:65 frame #21: 0x0000000106eec941 nginxngx_http_copy_filter(r=0x00007fdec4842c50, in=0x00007fff58d7c120) + 113 at ngx_http_copy_filter_module.c:152
      frame Is there any way to detect the status of the server node is normal #22: 0x0000000106ebae93 nginxngx_http_output_filter(r=0x00007fdec4842c50, in=0x00007fff58d7c120) + 83 at ngx_http_core_module.c:1981 frame #23: 0x0000000106ec4801 nginxngx_http_send_special(r=0x00007fdec4842c50, flags=) + 113 at ngx_http_request.c:3355
      frame Consider uploading this package to opm? #24: 0x0000000106f30682 nginxngx_http_lua_send_chain_link [inlined] ngx_http_lua_send_special(flags=1) + 834 at ngx_http_lua_util.c:579 frame #25: 0x0000000106f30650 nginxngx_http_lua_send_chain_link(r=, ctx=, in=) + 784 at ngx_http_lua_util.c:533
      frame feature: Round Robin with optional random start #26: 0x0000000106f32c03 nginxngx_http_lua_run_thread [inlined] ngx_http_lua_handle_exit + 54 at ngx_http_lua_util.c:2272 frame #27: 0x0000000106f32bcd nginxngx_http_lua_run_thread(L=, r=0x00007fdec4842c50, ctx=0x00007fdec4816240, nrets=) + 4605 at ngx_http_lua_util.c:1052
      frame can't load librestychash.so ,how to solve #28: 0x0000000106f4249a nginxngx_http_lua_socket_tcp_resume_helper(r=0x00007fdec4842c50, socket_op=<unavailable>) + 266 at ngx_http_lua_socket_tcp.c:5193 frame #29: 0x0000000106f432c7 nginxngx_http_lua_socket_tcp_read [inlined] ngx_http_lua_socket_handle_read_success(r=0x00007fdec4842c50, u=) + 121 at ngx_http_lua_socket_tcp.c:2976
      frame try use table.clone copy table #30: 0x0000000106f4324e nginxngx_http_lua_socket_tcp_read(r=0x00007fdec4842c50, u=<unavailable>) + 1630 at ngx_http_lua_socket_tcp.c:2094 frame #31: 0x0000000106f41bc4 nginxngx_http_lua_socket_tcp_handler(ev=) + 132 at ngx_http_lua_socket_tcp.c:2726
      frame Use luajit table clone #32: 0x0000000106ea4402 nginxngx_event_process_posted(cycle=0x00007fdec3808250, posted=0x0000000106fa2490) + 162 at ngx_event_posted.c:33 frame #33: 0x0000000106eac30d nginxngx_worker_process_cycle(cycle=0x00007fdec3808250, data=) + 173 at ngx_process_cycle.c:753
      frame Add mac os #34: 0x0000000106eaa67e nginxngx_spawn_process(cycle=0x00007fdec3808250, proc=(nginxngx_worker_process_cycle at ngx_process_cycle.c:728), data=0x0000000000000003, name="worker process", respawn=-3) + 782 at ngx_process.c:198
      frame Load balancing with "content_by_lua block" #35: 0x0000000106eabb9c nginxngx_start_worker_processes(cycle=0x00007fdec3808250, n=8, type=-3) + 124 at ngx_process_cycle.c:358 frame #36: 0x0000000106eab140 nginxngx_master_process_cycle(cycle=0x00007fdec3808250) + 352 at ngx_process_cycle.c:130
      frame support for tengine chash #37: 0x0000000106e84e89 nginxmain(argc=<unavailable>, argv=<unavailable>) + 2937 at nginx.c:367 frame #38: 0x00007fff938045ad libdyld.dylibstart + 1
      `

@agentzh
Copy link
Member

agentzh commented Aug 11, 2016

@rohitjoshi Will you use Valgrind to run your nginx process? You'd better turn off daemon and master_process in your nginx.conf.

Also, it's better to specify the ./configure options

    --with-luajit-xcflags='-DLUAJIT_USE_VALGRIND -DLUAJIT_USE_SYSMALLOC' \
    --with-debug \
    --with-no-pool-patch

while building the latest version of OpenResty (1.9.15.1).

The position indicated by gdb may not be the first scene of crime while Valgrind can usually pinpoint that.

Thanks!
-agentzh

@rohitjoshi
Copy link
Author

@agentzh we tried to run valgrind on mac osx but getting below error. Not able to reproduce on linux.

bad executable (__PAGEZERO is not 4 GB) valgrind: /opt/openresty/nginx/sbin/nginx: cannot execute binary file

@agentzh
Copy link
Member

agentzh commented Aug 11, 2016

@rohitjoshi Yeah, Valgrind is shaky on Mac OS X. Try running Valgrind against exactly the same OpenResty app and setup on Linux to see if you can find anything.

@rohitjoshi
Copy link
Author

@agentzh we tried that. infact on mac osx, we are able to reproduce (consistently) only on my team member's mac . I could not reproduce on my mac.

@agentzh
Copy link
Member

agentzh commented Aug 11, 2016

@rohitjoshi Still it's worth trying Valgrind since it can find memory issues that do not lead to segmentation faults. Ensure you have rebuilt OpenResty with the special ./configure options I gave above.

@rohitjoshi
Copy link
Author

@agentzh thanks. We will setup a nightly run with these options.

@agentzh
Copy link
Member

agentzh commented Aug 11, 2016

@rohitjoshi BTW, those options only work with valgrind runs though. Do not attempt to run the resulting nginx without valgrind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants