-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf improvement and refactoring #17
Comments
I'm all for improving the interface, and providing (ideally zero-cost) wrappers in the SHA1 class. Introducing a sha1 namespace is fine. I'm not even sure if sha1::hash needs to be a class, maybe plain functions and an abstract type "sha1::hash" would be more appropriate and easier to use? Of course, RAII might be a reason to stick with classes, but on the other hand, the intermediate state of SHA-1 calculations has a fixed size and can be allocated directly on the stack, I guess. By the way, I wonder why are you willing to put so much effort into this library. I'm aware that SHA-1 is still needed to support legacy systems (including badly-designed systems like Git), but I wonder why newly written code would ever need SHA-1. I'm very curious about your use-case. Also, what is the advantage of this library over the many other hashing libraries (which are usually more optimized to start with)? Is it our (non-)license? But, please don't get me wrong. Even though I wonder if it's worth it, I appreciate your effort and I'm willing to assist you. |
Thanks for considering my proposal! I will gladly answer al of your questions. Re: Namespace, class, and freely hanging methods Re: Why I do this This particular implementation satisfies most of my requirements - non-GPL, small and simple, and now header-only. :) I have to improve its efficiency and reliability, but that is easily doable. Re: Why SHA-1 I've implemented a C++ wrapper around the POSIX socket C API. The next step is to implement a WebSocket server, so that a user can connect to a C++ program from a web browser. The WebSocket handshake requires SHA-1 and base64. Re: What options I have I do notice, however, that our coding styles differ, and coming to an agreement on such a major refactoring is likely to leave one of us unhappy. Given that the core computation code is 20+ years old, I don't expect any major flaws in it yet to be discovered. Thus, I don't exclude the option of copying that core code in to my own project and moving on. Now that I've laid out my thoughts, the latter option seems easier, unless you convince me the other way. :) Same question to you: How do you use this library? Has any of the people who have forked or downloaded it, shared with you how they use it? |
I'm still very interested in your code, especially if you are willing to share it as public domain, and to add some lines for backwards compatibility. I don't mind if the implementation changes too much, as long as it is still easy to see that the code is safe. The implementation already changed quite a lot from the original implementation in C. (Maybe it would even make sense to go back to the original C implementation and to add C++ wrappers on top of those? On the other hand, this would miss some of the non-C++-specific improvements that we have accumulated so far.) Regarding your question: I don't have any insights about how this library is used. I assume that it's also just keeping legacy interfaces working, which applies to my use case as well. Given your target platform, it makes of course sense to try to improve performance. |
It's great to hear back! I've been working on an all-round http+json endpoint. And recently, I started exploring a whole new area - virtual memory. I expect for that to keep me busy for quite a few more months. WebSocket, for which I need SHA-1, keeps getting pushed back, though it's still on the roadmap. I have lost some context on my refactoring plan. We should resurrect this discussion once WebSocket bubbles up to the top of my priority list. Meanwhile, I have a question about "public domain". Does it have a concrete definition, or does it mean "no restrictions whatsoever"? I want to make sure it is the same or less restrictive than the MIT license, which I have adopted as one of the least restrictive licenses. |
I'd like to mention that I had much better experiences with HTTP(S) long polling than with WebSockets, at least with regard to stability (especially in the presence of networking issues). On the other hand, of course I don't want to discourage you on improving the SHA1 code. ;-)
Agreed.
"Public domain" has a strict meaning: https://en.wikipedia.org/wiki/Public_domain It is by definition less restrictive than any license you could think of, because it describes the absence of the need for a license. In some countries, e.g. many European countries such as Germany, the situation is slightly more complicated, but in the end boils down to the least amount of restrictions possible under law. |
Thanks for clarifying the "public domain". That makes sense. You are right about WebSocket. I don't know how much longer I will keep it on the roadmap. Let me finish the things that are on my mind, and if nothing new comes up, I'll try to find some time to refactor the SHA code. We are still talking months though. |
No problem, just come back if you have something. Also, just in case you finally decide to abandon this, feel free to drop a note as well. |
perf/no alloc + checks For perf and reliability you either have to use string_view or ptrs into buffer, be it string or anything else. If you are ok with C abi, then you can roll your own slice struct similar to what the clang C23 bound checks do. comptime As far as I understand it, C++ offers only ways to comptime create string- or other literals and compound objects of those things via templates, but not filling runtime-usable stuff via consteval fn and alike. |
The use of
std::string
kills perf in many ways - every construction and modification may perform a heap allocation. That may also throw an exception, and ultimately terminate the process.I hacked up a change that uses a
char [BLOCK_BYTES]
instead of astd::string
, and I consistently get a 25% reduction in time for the slow test. To enter the new implementation, I introduced a newupdate()
overload.This new implementation must be a base, and the existing public methods should call into it. Then I noticed a few things in the code that I prefer to fix as I go:
final()
method is not cohesive - it does 3 things: it finishes the work, produces a result (as astd::string
), and resets the sate. I would like to split all those 3 steps in separate methods.This line of thinking is leading me to a new class
sha1
or even better - put it in a namespace -sha1::hash
. We can keep the existing classSHA1
with the public API it has today. It will simply maintain a member ofsha1::hash
, and will delegate all the work to it.To summarize it, I want to have an efficient low-level class, so that clients who care about perf and reliability, can use it directly. We can keep the existing, wrapper around for convenience and back-compat purposes.
Please let me know what you think about this.
The text was updated successfully, but these errors were encountered: