-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ragel-bitmap for less memory #1343
Conversation
Looks like the tests are failing on older versions of ruby because of a bundler incompatibility unrelated to this code. |
.travis.yml
Outdated
@@ -3,7 +3,8 @@ cache: bundler | |||
|
|||
bundler_args: --without local_development | |||
before_install: | |||
- gem install bundler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have to install the bundler explicitly, #1308
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to remove, was just trying to get the tests to pass.
ruby 2.6.3
jruby 9.2.7.0
|
@ahorek can you share the benchmark script you used to generate that please? |
sure, it's nothing special, just parsing a few real world emails benchmark script
these bitmaps will be stored as bigints and each lookup has to be computed which is much more expensive then a simple array reference |
Hmm... I suppose it depends on the use case. For the most part I'd imagine people will trade the speed for the memory, but some folks might be parsing tons and tons of emails. I think it'd be pretty easy to add an option to class Ragel::Bitmap
class << self
attr_reader :get_speed_lose_memory
def get_speed_lose_memory!
@get_speed_lose_memory = true
end
def from(width, numbers)
if get_speed_lose_memory
reverse_engineer_bitmap(width, numbers)
else
new(width, numbers)
end
end
end
@get_speed_lose_memory
end That way people could call |
Thanks @ahorek, i'm able to reproduce your results. I'm getting about 10 i/s with this pr and 148 i/s from the currently released gem. For my case the only reason a parser is being required is for the |
I tried using a string as the backing store for the bitmap instead of a bignum ( fcheung/ragel-bitmap@8dd2279 ) I'm seeing similar memory decreases (~23mb). I still get a performance hit, but it is less noticeable. With the benchmark script above & a random email from my inbox I get 158 i/s on master, and 97i/s with my branch. Still quite a big hit, but getting closer. |
Nice! I did some testing of A = Array.new(20_000, 123)
B = A.pack("c*")
W = A.pack("n*")
Benchmark.ips do |x|
x.report('array', <<-end)
A[5678]
end
x.report('words.byteslice.unpack', <<-end)
idx = 5678 << 1
W.byteslice(idx, 2).unpack("n")
end
x.report('getbyte', <<-end)
B.getbyte(5678)
end
x.report('bytes << 8 | bytes', <<-end)
(B.getbyte(5678) << 8) | B.getbyte(5678)
end
x.report('words | words', <<-end)
idx = 5678 << 1
(W.getbyte(idx) << 8) | W.getbyte(idx + 1)
end
end Result:
So from this microbenchmark we get the best performance from shift-ing and or-ing two getbyte calls. A little over 2x faster than byteslice.unpack, but still a little more than 3x slower than plain array access. I made a branch to test this and used the benchmark above: FWIW, I left arrays with values >= 2**16 alone, since we only have one such array and that space could be considered "less wasted" anyways. Before
After
|
@kddeisz's approach has a design flaw, the resident memory decreases, but it allocates 3 magnitudes more memory during runtime. So if you want to save the memory, in reality it'll do the opposite. here's a comparsion how much memory is needed to parse a single email ( https://github.com/SamSaffron/memory_profiler )
@jhawthorn's version looks promising. I hope we could find a good tradeoff between memory / performance. Thank you all for working on this! btw: here's an interesting talk about the topic https://www.youtube.com/watch?v=ubykHUyNi_0 |
We can switch over to “benchmark/ipsa” which in addition to giving
iterations per second also show allocations per iteration.
Love the conversation here BTW.
|
Okay friends - I finally got a minute to get back to this. I've updated and pushed up a new version of Could you folks that are better at benchmarking than me give this a shot? I think the strings are going to end up yielding much better results than bignums. EDIT: Tests on 1.8 were failing because apparently 1.9 introduced |
Hey everyone - I'm bringing this back up in case there's any interest, otherwise I'm going to close this out. |
I think ultimately any solution needs to not have any decrease in parsing time to be accepted. I remember talking about this in slack. I think that if you programmatically generate an array of the same size that it takes dramatically less memory. I think the bulk of the memory is coming from the VM and parser and not the array itself. While there might be some performance improvements to be made there (@nobu any ideas?), one solution not fully explored could be having the intermediate representation generate an array like: ::Ragel::Bitmap::Array8.new("<string>").to_a I think if we can find a solution that is as performant as the current parser while reducing memory footprint that such a solution could move forward. |
I think the performance is acceptable now, even if it's still slightly worse a quick bench
an open question is, why do we need 1.5-1.7MB of ruby code to parse an email address in the first place? https://github.com/kddeisz/mail/blob/use-ragel-bitmap/lib/mail/parsers/address_lists_parser.rl @jeremy any feedback on this? since you're the only maintainer, we can't go any further without your approval |
Looking at those numbers this branch is using 80% less memory at a trade-off of going 10% slower. Can you give any guidance on moving forwards? |
It's been 3 years, so I'm assuming this isn't going to happen. Happy to reopen if we hear back from the maintainers. <3 |
Sorry for making a "+1" comment but at the moment the mail gem uses as much memory as all of rails and its component gems put together (approximately 30mb for mail to 30mb for the rest of rails). It looks like this PR stalled, which is a shame. |
hey @mikel could you take a look again? This PR was closed due to radio silence, but I think it's still a good idea to consider. Thanks! |
bump |
@kddnewton mind to reopen this PR and rebase it on |
:D Thanks all 🚢 |
Hi friends -
This is an attempt at mitigating very large memory usage as per #1342. ragel-bitmap is a gem I made yesterday that has a replace-inline tool for ruby code generated by ragel that replaces integer arrays with bitmap objects with
#[]
defined so they act the same. @schneems found this patch reduces memory consumption by about 23mb.In terms of performance, I haven't found any kind of significant difference. That being said, my performance measurement abilities are relatively rudimentary, so I would be happy to be coached on this if someone else wanted to take a shot at it.