-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neighbourlist integer size #977
Neighbourlist integer size #977
Conversation
@Iximiel writing a regtest that fails with master and fails with this is a bit tricky, because we do not want to store a trajectory with 1M atoms. But you can write a simple awk script that generates such trajectory on the fly (see e.g. |
I did used the trick from rt-fix-226, so no new trajectories have been pushed. |
Note: the problem should require much less than 1M particles. |
I am confused. Why are you adding dynamically loaded code? What we need is simply a test that computes the coordination between a group of ~ 100k atoms, and that will crash with master and not crash with the modified version. And it should be a separate test indeed, not using dynamic libraries (otherwise, for instance if we at some point we run on windows this test will not be used). |
I would also suggest not to have a test with a small (working) number of atoms. We only need to test the failing case. Coordinations are already tested somewhere else. |
I'm actually using ~100K atoms, I don't know why I persuaded myself that I am using 1M atoms...
Because I want to test if the problem arises from the neighbor list, and I am testing it in the more Isolated way possible, before testing it in a more complex environment. I'm moving the test to a make-style one to not use dlopen |
@GiovanniBussi #include <vector>
#include <utility>
int main(){
size_t d1=100000;
size_t dd=(d1*(d1-1))/2;
std::vector <std::pair<size_t,size_t>> test(dd);
} That it is similar to what it is happening in the constructor of the NL, In plumed, without int main(int, char**){
Pbc pbc{};
pbc.setBox(PLMD::Tensor({1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0}));
std::vector<AtomNumber> list0(100000);
size_t i=0;
for (auto& an:list0){
an.setIndex(i);
++i;
}
bool serial=true;
bool do_pbc=false;
Communicator cm{};
auto nl=NeighborList( list0, serial, do_pbc, pbc,cm);
} what do you think? |
Thanks!
Does this imply that in the original plumed test you received a std::bad_alloc? I thought you had an unexplained crash. If it's a std::bad_alloc it should be reported in the log file. Are you sure it is not? Then I see that the issue is in storing the neighbor list. If we have more than 232 elements allowed in the list (which is the reason to use long long), and if atoms are stored as 8 bytes integers, this will result on 28(232) = 64 Gb, which is anyway out of reach for most architectures. |
I refined the test more to understand what it is happening:
I think it is better to risk a bad_alloc (and then construct a gentle way out of the problem) with the dimension stored in In this, I do not think I have manage to reproduce the
|
I agree. indexes should be size_t. The bad_alloc will be reported on the log file. We can to even more: as we know that the critical allocation happens in that constructor, you can put it in a There's even a way to report this with a nested exception, something like:
|
commit 1b70cb0 Author: Daniele Rapetti <[email protected]> Date: Tue Oct 10 10:56:35 2023 +0200 Added an exception to catch the memory error commit 19f37da Author: Daniele Rapetti <[email protected]> Date: Tue Oct 10 10:26:57 2023 +0200 impreoved readability for NL commit 6c50341 Author: Daniele Rapetti <[email protected]> Date: Tue Oct 10 09:41:14 2023 +0200 test now check that the results for initialization are the same with the "simpler" algortithm commit c469d22 Author: Daniele Rapetti <[email protected]> Date: Mon Oct 9 16:58:55 2023 +0200 refactored the NL constructors commit 3e8e66c Author: Daniele Rapetti <[email protected]> Date: Mon Oct 9 16:07:53 2023 +0200 reformatted the test commit 03e5a41 Author: Daniele Rapetti <[email protected]> Date: Mon Oct 9 15:12:21 2023 +0200 integer type change and small optimization of the initialization of the NL commit 2d5eb2e Author: Daniele Rapetti <[email protected]> Date: Mon Oct 9 14:09:45 2023 +0200 added a test to control the NL initialization(atomsID) commit 04f762c Author: Daniele Rapetti <[email protected]> Date: Mon Oct 9 11:45:17 2023 +0200 added a test to control the NL initialization(size)
I did some change in the code:
For the initialization performance I wrote this small benchmark: #include "plumed/tools/AtomNumber.h"
#include "plumed/tools/Communicator.h"
#include "plumed/tools/NeighborList.h"
#include "plumed/tools/Pbc.h"
#include "plumed/tools/Stopwatch.h"
#include <iostream>
using PLMD::AtomNumber;
using PLMD::NeighborList;
using PLMD::Tensor;
using std::vector;
int main(int, char **) {
PLMD::Stopwatch sw;
PLMD::Pbc pbc{};
pbc.setBox(Tensor({1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0}));
PLMD::Communicator cm{};
bool serial = true;
bool do_pbc = false;
const size_t nat0 = 10000;
vector<AtomNumber> list0(nat0);
const size_t nat1 = nat0;
vector<AtomNumber> list1(nat1);
{
auto sww = sw.startStop("setIndex");
for (size_t i=0;i<nat0;++i) {
list0[i].setIndex(i);
list1[i].setIndex(i);
}
}
for (auto rep = 0; rep < 100; ++rep) {
{
auto sww = sw.startStop("Single list");
auto nl = NeighborList(list0, serial, do_pbc, pbc, cm);
}
{
bool do_pair = false;
auto sww = sw.startStop("Double list, no pairs");
auto nl = NeighborList(list0, list1, serial, do_pair, do_pbc, pbc, cm);
}
{
bool do_pair = true;
auto sww = sw.startStop("Double list, with pairs");
auto nl = NeighborList(list0, list1, serial, do_pair, do_pbc, pbc, cm);
}
}
std::cerr << sw;
} whose results are:
|
@@ -19,7 +19,7 @@ | |||
You should have received a copy of the GNU Lesser General Public License | |||
along with plumed. If not, see <http://www.gnu.org/licenses/>. | |||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */ | |||
#ifndef __PLUMED_tools_NeighborList_h | |||
#ifndef __PLUMED_tools_NeighborList_h //{ |
Check notice
Code scanning / CodeQL
Commented-out code Note
In local the catch(...) works, here in the CI I get the stack dump along with the error message, I try to workaround that with a regtest_after. |
I added the "optional" memory error if NL is compiled on a mac system. The reason of these modification, I am not 100% sure about this: looks like that on mac if you finish the RAM you start to get the memory allocated in a page file, so you are not getting the memory error I had set up previously, and you get a very heavy performance hit (I think that this is the reason of the tests that getting more than 6 hrs to fail). |
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #977 +/- ##
==========================================
- Coverage 85.18% 84.10% -1.09%
==========================================
Files 600 602 +2
Lines 55555 56104 +549
==========================================
- Hits 47325 47184 -141
- Misses 8230 8920 +690
☔ View full report in Codecov by Sentry. |
Description
I found that if a user ask a full neighbor list with a more than circa ~1M atoms (meaning that the neighbor list would have
nat(nat-1)/2
elements) plumed will crash, this may be a bug due to the integers type on some indexes:By compiling with
--enable-debug --enable-debug-glibcxx
I get the following messagethe plumed.dat is
I'm setting up this as a draft because I am unsure if the problem is only on the NL or is more "deep"
Target release
I would like my code to appear in release 2.9
Type of contribution
Copyright
COPYRIGHT
file with the correct license information. Code should be released under an open source license. I also used the commandcd src && ./header.sh mymodulename
in order to make sure the headers of the module are correct.Tests