Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.pp.construct_gems_using_nmf error #22

Open
AllenYGY opened this issue Nov 27, 2024 · 2 comments
Open

fs.pp.construct_gems_using_nmf error #22

AllenYGY opened this issue Nov 27, 2024 · 2 comments

Comments

@AllenYGY
Copy link

AllenYGY commented Nov 27, 2024

Hi, when I run the tutorials of flowsig 'mouse_embryo_stereoseq_example.ipynb' with default parameters and python 3.8 environment, I have the following bug:

0001 numerical instability (try 9)
0000 learning rate: 1.95e-05
W0000 00:00:1732718987.791411   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 0. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791489   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 1. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791505   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 2. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791537   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 3. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791554   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 4. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791567   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 5. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791657   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 6. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791672   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 7. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791691   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 8. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791718   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 9. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791733   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 10. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791758   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 11. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791794   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 12. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791805   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 13. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791816   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 14. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791827   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 15. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791849   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 16. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791863   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 17. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791893   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 18. The input might not be valid. Filling lower-triangular output with NaNs.
W0000 00:00:1732718987.791908   18995 cholesky_op_gpu.cu.cc:205] Cholesky decomposition was not successful for batch 19. The input might not be valid. Filling lower-triangular output with NaNs.
---------------------------------------------------------------------------
NumericalDivergenceError                  Traceback (most recent call last)
Cell In[36], line 1
----> 1 fs.pp.construct_gems_using_nsf(adata,
      2                               n_gems=20,
      3                               layer_key='count',
      4                               length_scale=5.0
      5                               )
      9 # fs.pp.construct_gems_using_nmf(adata,
     10 #                             n_gems = 20,
     11 #                             layer_key = 'count',
     12 #                             random_state =0,
     13 #                              max_iter=20)
     15 commot_output_key = 'commot-cellchat'

File [~/shared/JunyaYang/flowsigenv/lib/python3.11/site-packages/flowsig/preprocessing/_gem_construction.py:88](https://hub.cpu.epti.moe/user/s230026188/Bioinfo-Project/lab/tree/shared/JunyaYang/Bioinfo-Project/shared/JunyaYang/flowsigenv/lib/python3.11/site-packages/flowsig/preprocessing/_gem_construction.py#line=87), in construct_gems_using_nsf(adata, n_gems, layer_key, spatial_key, n_inducing_pts, length_scale)
     86 fit.init_loadings(D["Y"], X=Xtr, sz=D["sz"], shrinkage=0.3)
     87 tro = sf.ModelTrainer(fit)
---> 88 tro.train_model(*Dtf, status_freq=50) #about 3 mins
     90 insf = interpret_nsf(fit,Xtr,S=100,lda_mode=False)
     92 adata.uns['nsf_info'] = insf

File [~/shared/JunyaYang/flowsigenv/lib/python3.11/site-packages/spatial_factorization/training.py:310](https://hub.cpu.epti.moe/user/s230026188/Bioinfo-Project/lab/tree/shared/JunyaYang/Bioinfo-Project/shared/JunyaYang/flowsigenv/lib/python3.11/site-packages/spatial_factorization/training.py#line=309), in ModelTrainer.train_model(self, lr_reduce, maxtry, verbose, ckpt_freq, *args, **kwargs)
    308 except (tf.errors.InvalidArgumentError,NumericalDivergenceError) as err: #cholesky failure
    309   tries+=1
--> 310   if tries==maxtry: raise err
    311   #else: #not yet reached the maximum number of tries
    312   if verbose:

File [~/shared/JunyaYang/flowsigenv/lib/python3.11/site-packages/spatial_factorization/training.py:304](https://hub.cpu.epti.moe/user/s230026188/Bioinfo-Project/lab/tree/shared/JunyaYang/Bioinfo-Project/shared/JunyaYang/flowsigenv/lib/python3.11/site-packages/spatial_factorization/training.py#line=303), in ModelTrainer.train_model(self, lr_reduce, maxtry, verbose, ckpt_freq, *args, **kwargs)
    302 while tries < maxtry:
    303   try:
--> 304     self._train_model_fixed_lr(mgr, *args, ptic=ptic, wtic=wtic,
    305                                verbose=verbose, ckpt_freq=ckpt_freq,
    306                                **kwargs)
    307     if self.epoch>=len(self.loss["train"])-1: break #finished training
    308   except (tf.errors.InvalidArgumentError,NumericalDivergenceError) as err: #cholesky failure

File [~/shared/JunyaYang/flowsigenv/lib/python3.11/site-packages/spatial_factorization/training.py:232](https://hub.cpu.epti.moe/user/s230026188/Bioinfo-Project/lab/tree/shared/JunyaYang/Bioinfo-Project/shared/JunyaYang/flowsigenv/lib/python3.11/site-packages/spatial_factorization/training.py#line=231), in ModelTrainer._train_model_fixed_lr(self, ckpt_mgr, Dtrain, Ntr, Dval, S, verbose, num_epochs, ptic, wtic, ckpt_freq, kernel_hp_update_freq, status_freq, span, tol, pickle_freq)
    230 self.loss["train"][i] = trl
    231 if not np.isfinite(trl) or trl>self.loss["train"][1]:
--> 232   raise NumericalDivergenceError
    233 if i%status_freq==0 or i==num_epochs:
    234   if Dval:

NumericalDivergenceError:

And I have tried different version of python and tensorflow

My environment is :

Package Version


absl-py 2.1.0
adjustText 1.2.0
aiobotocore 2.5.4
aiohappyeyeballs 2.4.3
aiohttp 3.10.11
aioitertools 0.12.0
aiosignal 1.3.1
anndata 0.10.9
annotated-types 0.7.0
annoy 1.17.3
anyio 4.6.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
array_api_compat 1.8
arrow 1.3.0
asciitree 0.3.3
asttokens 2.4.1
astunparse 1.6.3
async-lru 2.0.4
attrs 24.2.0
babel 2.16.0
beautifulsoup4 4.12.3
biothings-client 0.3.1
bleach 6.1.0
bokeh 3.6.0
botocore 1.31.17
cachetools 5.5.0
causaldag 0.1a163
certifi 2024.8.30
cffi 1.17.1
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
colorcet 3.1.0
comm 0.2.2
conditional-independence 0.1a6
contourpy 1.3.0
cycler 0.12.1
dask 2024.11.2
dask-expr 1.1.19
dask-image 2024.5.3
dataclasses 0.6
datashader 0.16.3
debugpy 1.8.7
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.9
distributed 2024.11.2
dm-tree 0.1.8
docopt 0.6.2
docrep 0.3.2
einops 0.8.0
et-xmlfile 1.1.0
executing 2.1.0
fasteners 0.19
fastjsonschema 2.20.0
filelock 3.16.1
flatbuffers 24.3.25
flowsig 0.1.2
fonttools 4.54.1
fqdn 1.5.1
frozendict 2.4.4
frozenlist 1.4.1
fsspec 2023.6.0
ftpretty 0.4.0
gast 0.6.0
geopandas 1.0.1
goatools 1.4.12
google-ai-generativelanguage 0.6.10
google-api-core 2.22.0
google-api-python-client 2.151.0
google-auth 2.35.0
google-auth-httplib2 0.2.0
google-generativeai 0.8.2
google-pasta 0.2.0
googleapis-common-protos 1.65.0
graphical-model-learning 0.1a8
graphical-models 0.1a21
grpcio 1.68.0
grpcio-status 1.68.0
h11 0.14.0
h5py 3.12.1
h5sparse 0.1.0
holoviews 1.20.0
httpcore 1.0.7
httplib2 0.22.0
httpx 0.27.2
idna 3.10
igraph 0.11.6
imageio 2.35.1
importlib_metadata 8.5.0
inflect 7.4.0
ipdb 0.13.13
ipykernel 6.29.5
ipython 8.27.0
ipywidgets 8.1.5
isoduration 20.11.0
jedi 0.19.1
Jinja2 3.1.4
jmespath 1.0.1
joblib 1.4.2
json5 0.9.25
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
jupyter 1.1.1
jupyter_client 8.6.3
jupyter-console 6.6.3
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.2
jupyter_server_terminals 0.5.3
jupyterlab 4.2.5
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.3
jupyterlab_widgets 3.0.13
keras 3.5.0
kiwisolver 1.4.7
lazy_loader 0.4
legacy-api-wrap 1.4
leidenalg 0.10.2
libclang 18.1.1
linkify-it-py 2.0.3
llvmlite 0.43.0
locket 1.0.0
louvain 0.8.2
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
matplotlib-inline 0.1.7
matplotlib-scalebar 0.8.1
mdit-py-plugins 0.4.2
mdurl 0.1.2
mistune 3.0.2
mizani 0.11.4
ml-dtypes 0.4.1
more-itertools 10.5.0
mpmath 1.3.0
msgpack 1.1.0
multidict 6.1.0
multipledispatch 1.0.0
multiscale_spatial_image 1.0.1
mygene 3.2.2
namex 0.0.8
natsort 8.4.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.3
notebook 7.2.2
notebook_shim 0.2.4
numba 0.60.0
numcodecs 0.13.0
numexpr 2.10.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.6.85
nvidia-nvtx-cu12 12.1.105
ome-zarr 0.9.0
omnipath 1.0.8
openpyxl 3.1.5
opt_einsum 3.4.0
optree 0.12.1
overrides 7.7.0
packaging 24.1
pandas 2.2.3
pandocfilters 1.5.1
panel 1.5.4
param 2.1.1
parso 0.8.4
partd 1.4.2
patsy 0.5.6
pexpect 4.9.0
pgmpy 0.1.26
pillow 10.4.0
PIMS 0.7
pip 24.2
platformdirs 4.3.6
plotnine 0.13.6
pooch 1.8.2
progressbar2 4.5.0
prometheus_client 0.21.0
prompt_toolkit 3.0.48
proto-plus 1.24.0
protobuf 5.28.3
psutil 6.0.0
ptyprocess 0.7.0
pure_eval 0.2.3
pyarrow 17.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycparser 2.22
pyct 0.5.0
pydantic 2.9.2
pydantic_core 2.23.4
pydot 3.0.2
pygam 0.9.1
Pygments 2.18.0
pyliger 0.2.3
pynndescent 0.5.13
pyogrio 0.10.0
pyparsing 3.2.0
pyproj 3.7.0
python-dateutil 2.9.0.post0
python-igraph 0.11.6
python-json-logger 2.0.7
python-utils 3.9.0
pytz 2024.2
pyviz_comms 3.0.3
PyYAML 6.0.2
pyzmq 26.2.0
referencing 0.35.1
requests 2.32.3
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.9.4
rpds-py 0.20.0
rsa 4.9
s3fs 2023.6.0
scanpy 1.10.3
scikit-image 0.24.0
scikit-learn 1.5.2
scipy 1.11.4
seaborn 0.13.2
Send2Trash 1.8.3
session_info 1.0.0
setuptools 75.1.0
shapely 2.0.6
six 1.16.0
slicerator 1.1.0
sniffio 1.3.1
sortedcontainers 2.4.0
soupsieve 2.6
spatial_factorization 0.0.1
spatial_image 1.1.0
spatialdata 0.2.3
squidpy 1.6.1
stack-data 0.6.3
statsmodels 0.14.3
stdlib-list 0.10.0
sympy 1.13.3
tblib 3.0.0
tensorboard 2.18.0
tensorboard-data-server 0.7.2
tensorflow 2.18.0
tensorflow-io-gcs-filesystem 0.37.1
tensorflow-probability 0.24.0
termcolor 2.4.0
terminado 0.18.1
texttable 1.7.0
tf_keras 2.18.0
threadpoolctl 3.5.0
tifffile 2024.9.20
tinycss2 1.3.0
toolz 1.0.0
torch 2.4.1
tornado 6.4.1
tqdm 4.66.5
traitlets 5.14.3
triton 3.0.0
typeguard 4.3.0
types-python-dateutil 2.9.0.20240906
typing 3.7.4.3
typing_extensions 4.12.2
tzdata 2024.2
uc-micro-py 1.0.3
umap-learn 0.5.6
uri-template 1.3.0
uritemplate 4.1.1
urllib3 1.26.20
validators 0.34.0
wcwidth 0.2.13
webcolors 24.8.0
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.0.4
wheel 0.44.0
widgetsnbextension 4.0.13
wrapt 1.16.0
xarray 2024.9.0
xarray-dataclasses 1.8.0
xarray-datatree 0.0.14
xarray-schema 0.0.3
xarray-spatial 0.4.0
xgboost 2.1.1
XlsxWriter 3.2.0
xyzservices 2024.9.0
yarl 1.13.1
zarr 2.18.3
zict 3.0.0
zipp 3.20.2

@axelalmet
Copy link
Owner

Hi AllenYGY,

Sorry that you have this error. I've noticed that NSF has been tricky for most users and I don't have good answers for why it sometimes fails. Do you mind telling me what machine you are trying to run FlowSig on?

I will take a closer look at the packages you have installed and see where the differences are with my own environment for FlowSig and see if that reveals anything.

Best wishes,
Axel.

@AllenYGY
Copy link
Author

Hi Axel,

Thank you for your quick response and for looking into this! I’m running FlowSig on the following machine:

Linux 228461c49223 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Let me know if you need more details about the environment. I really appreciate you taking the time to compare the packages between our setups—hopefully, that will help pinpoint the issue. Please let me know if there’s anything else I can provide to assist with troubleshooting.

Best,
AllenYGY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants