-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No central location for AVX math #32
Comments
Did you look through the Universal SIMD NEP and the work being done to make this not specific to x86? The idea is that abstract intrinsics will be replaced (at compile time, by macros) with platform-specific ones. |
As long as that package is external to numpy and python, and is a lib that can be loaded and used universally, then I agree. The vector math routines are universal to all platforms and languages and should be decoupled from numpy. |
This is dependent on action from NumPy. It would be good to open an issue there and link that back here. |
btw, last week I took the time to show a complete compare & contrast example of the log operation written in numpy with C macro expansion vs C++ templates. I am biased, but I think the C++ template is more portable, easier to read and maintain. The cpp file ops_log.cpp stands on its own. And then another routine here: Then both Intel and Microsoft compilers have gcc has That is at least 5 different versions of AVX log functions...which one is the best of breed? Then further, the numpy implementation (which is well optimized by a skilled developer who knew what they were doing)... even that well written code has a missing optimization:
the call to get_exponent and get_mantissa are very similar and can be further optimized into one routine that returns both the exponent and mantissa. numpy get_exponent It is easier to spot this optimization in the templated code because it is easier to read and follow. My C++ editor can tag all the functions, showing me both template and intrinsics information. Compiler errors are easier to understand and fix. But my editor has little clue what is going on with the macro expansion C code. Is anyone going to further optimize the log operation in |
xref numpy/numpy#17698 |
In particular there is this breakout of the Universal SIMD functions. Maybe we could be its first users? |
In numpy, I notice numpy intrinsics being checking by xiegengxin
xiegengxin
However, it is difficult to tell which routines are best because they are also checked in tis project.
Then further when tests are added... like this one for testing np.log strided,
np.log strided
They do not test the multithreaded version.
Further, writing C macro expansion code like below is not easily portable and harder to follow -- this is only done because not allowed to use template code?
And this is done, because we have not broken out and linked to an external math package?
That seems to me like the primary problem that should be resolved, otherwise code like this will keep being written.
thus it becomes difficult to track best of breed because there are two locations this code is being wrtten to.
Further, new intel intrinsics have been added to immintrin.h where Intel/Microsoft/etc. are providing the speed ups for functions like np.sin(). This is again, in contrast to code being checked into numpy/core/src/umath/simd.inc.src
The text was updated successfully, but these errors were encountered: