-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update benchmark with ugrep 6.1 which is faster than hypergrep #44
Comments
Hello! Yes, I'd be happy to update my benchmarks using the latest ugrep v4.0.5. I'm excited for your plans to further speed up And yes, |
Yes. Simple grep is just a demo application. Large files are searched very quickly. But it lacks a lot of features to make it worthwhile to compare. There is no line counting overhead whatsoever, so it is definitely fast. I don't know if you're interested in any of the following details, I just wanted to convey ongoing efforts and short-term plans. Motivation: I am very interested and curious in figuring out new ways to speed up pattern matching algorithmically and try it out in practice. Ugrep was born because of this about four years ago (unoptimized) to try something new I came up with, which I call PM for "predict match". I got ugrep going based on a demo grep I wrote for RE/flex. I want to try hypergrep instead of simple grep to see where we stand now and what needs to be tweaked in ugrep, besides comparing it to some other optimizations that are/were planned, but haven't been released yet. Background: I've extensively used simple grep to evaluate and compare the speed of matching/search methods. Some early R&D work I did on a new regex search method focussed primarily on classes of patterns that involve short and long strings in growing numbers (e.g. set of words of various sizes). The bitap method and derivatives are really poor at this when its window is short since it is limited to the shortest word in the set. I spent time to find better ways to see if it is possible to compete with hyperscan and other grep tools, and eventually found that PM-4 is nice and superior when matching words up to sets of a 2k or so and does not need SIMD to run fast. It's using a Bloom filter of sorts for short 1 char to 4 char word patterns simultaneously with bit banging. This is combined with bitap, depending on a formula that looks at the entropy of a pattern to match to decide what method should be used. When this combination is used, it is faster than hyperscan (i.e. simple grep) by predicting at which positions a match may occur. What's next: It might be that other algorithmic combinations implemented in SIMD (AVX and AArch64) are better, or not. I've tried bitap-based methods in pure AVX2 with the 256 byte table in vectors to have zero memory loads except for input, but it ran about the same time as the scalar version. The latest ugrep v4 uses a very different way to match patterns using AVX and AArch64 instructions that are picked by the decision logic to execute when (roughly) deemed best for matching. Anyway, I will try hyperscan in about two/three weeks (I am on a break) and see what happens. |
I just wanted to drop a short note that after a bit of tweaking and adding two common grep-like optimizations, the upcoming ugrep v4.1 release is pretty fast on x64 and arm64 and almost always faster than other fast grep tools, see a preview report posted here: Genivia/ugrep#289 The difficulty is sometimes picking the number of threads and the search method to use that's best. Finally, it might be again important to point out for your benchmarks that the PS. I hope you have a great week and happy coding! |
OK. As promised, we dropped a new ugrep v4.3 with its performance report showing that ugrep is nearly always faster than other grep on x64 and arm64 systems. I have not been able to compare against hypergrep because it doesn't compile #45. Another update v4.3.2 will be available soon to speed up recursive search further with improved thread scheduling. Please note that there is one case of slow regex patterns |
FYI. ugrep 5.0 was released some time ago, an important milestone. This release fixes the performance problems with "leading wildcard" patterns such as |
FYI. ugrep 6.1 was released and is faster than hypergrep for several test cases we tried. |
@genivia-inc, I checked the latest commit of hypergrep (with stable vectorscan) against latest commit of ugrep, ugrep was slower. When searching in /usr, number of results was different, as hypergrep has relaxed implementation of https://github.com/p-ranav/hypergrep/blob/master/src/is_binary.cpp (and hypergrep can't search binary files due to hardcode), but hypergrep scanned more files there and still was 3 times faster. Vectorscan was compiled with I think it is worth to compare RE-flex against vectorscan. Back in 2019 when paper https://www.usenix.org/system/files/nsdi19-wang-xiang.pdf was released, on one dataset hyperscan was 183.3x faster than PCRE2.
|
Some observations: All runs above are with recursive searching and they are all on large file systems only. Not what a typical user would do frequently. These are also likely to contain a lot of binary files (to skip, so performance may depend more on how that is handled rather than pattern search). These recursive searches are subject to a lot more unknown variables on how things actually perform. To avoid this kind of selective benchmarking, I had set up a controlled experimental environment in the ugrep benchmark repo with many different search scenarios that are typical grepping that people can download and try themselves. For the first test, are you sure hgrep isn't matching newlines in the pattern too, so the number of matches you get is larger? The In your comparison, there is no way to tell if files aren't evicted from the file cache before a run or during a run. Runs are intended to measure search speed instead of IO latency that is unpredictable. Sometimes one can get lucky or not and even have it reproducible. Recursive search performance also depends on the allocation of buffer memory and worker threads, which in turn depends a lot on the type of CPU and machine you're using. Ugrep tries to make a guess how many workers should be assigned, but this is not perfect. Despite that, recursive searching easily leads to a 2x to 3x difference without anything more specifically optimized. You see this with ripgrep recursive searching that is slower by 2x or 3x compared to ugrep as the benchmarks show for Swift repo search but not for SSL repo search, so it's not consistent and that is fine. I picked benchmark tests not to benefit ugrep, but to compare. Search optimization with SIMD is mostly pattern prediction, not regex matching with DFA. Simple vector operations to predict a possible match with byte comparisons and AND/OR logic are quick and many times faster than classic regex pattern matching with DFA and non-SIMD methods. |
@genivia-inc , thank you for quick response! Regarding Well, with few more tests (launched few times for warmup) I confirmed my theory that vectorscan is fast when searching needles in haystack:
While time explodes when counting characters (effectively killing the idea of SIMD):
And vectorscan fails with |
Good point. You may have noticed by now that my intent was to poke a bit with the change in the title of this issue to remind you to update your README as you may had in mind many months ago. The comparison is so out of date (for almost a year) and includes some known bad regex search cases for ugrep. It's also including known bad cases for ugrep that were addressed long ago. Ugrep is faster in cases and in other cases it is not. I tested on a machine that supports AVX2. I don't test AVX512BW which may actually be slower. I know why and others pointed that out too, but it is not a priority at this time because that CPU is not as widely used as SSE/AVX2 and AArch64.
Ugrep also uses SIMD in this case but uses a different approach to find "needles" in a way that should give more stable performance for a wide range of regex patterns and words to search (for ugrep there is no difference between these). In addition, then a quick bloom filter check to improve match prediction, which is even more important when some regex pattern matches are short, e.g. one, two or three characters. Also large word sets to search give stable performance numbers (>1000 words to search) for which SIMD becomes less useful or is no longer applicable. No sudden slow downs to minutes instead of a fraction of a second. To comment on the last reply: the time difference observed in your first example may not be due to SIMD, but more likely dependent on 1) buffer size as system time is 40ms higher and 2) the SIMD match method used. Even the exact same method depends on a heuristic choice of needles to find, which works very well in general, but there is variation because, well it's heuristic so never always perfect. Hence, one will find regex patterns that work better for one tool but not for another. Recursive searching is even less dependent on SIMD. Directory traversal and order and worker thread assignment to load balance are greater factors too. Very difficult to generically optimize for all OS and CPUs to always run faster. |
Nice project!
I came across it because a ugrep user found a Reddit post about the project. Thanks for including ugrep in the benchmarks. It is always interesting how ugrep can be used in comparison. If there are some bottlenecks to speed, then I put it on the TODO list to resolve (see my note below).
The ugrep version used in these benchmark isn't up to date. More optimal search methods and optimizations were implemented in ugrep 3.12 and continued with 4.0. You may want to run the benchmarks with the latest ugrep e.g. v4.0.5. The old version 3.11 had an issue picking the wrong search algorithm in some cases, leading to a fall back to the default method that is slower.
Note that there are also plans for enhancing ugrep's search speed further with some tricks other grep use, as well as adding a new method to deal with pattern cases that ugrep doesn't yet handle well (long initial wildcard characters like
.*foo
that's not optimized at all but is easy to do by looking forfoo
), but these plans were (and some still are) delayed to give priority to new and improved features while ensuring reliability. I won't give a list of things that were added over time (since this is not about ugrep, but about hypergrep). Simply getting high performance with lots of new features is a tricky combination to do at the same time.People always ask about new features or to improve them, they care a bit less about speed. Most often if they really want speed, then the its usually to search "cold" file systems. Then qgrep might be a good choice. Or perhaps ugrep-indexer.
I am curious, is hypergrep based on hyperscan? EDIT: sorry, dumb question, I see the answer in the README up front now. I wanted to ask, because hypergrep has a "simple grep" example that works OK, but not stellar.
The text was updated successfully, but these errors were encountered: