NumericalDivergenceError of fs.pp.construct_gems_using_nsf function #13

DBinary · 2024-09-17T06:15:05Z

Hi, when I run the tutorials of flowsig with default parameters and python 3.8 environment, I have the following bug:

Temporary checkpoint directory: /tmp/tmpv1t2fq2f
0002 numerical instability (try 1)
0000 learning rate: 5.00e-03
0002 numerical instability (try 2)
0000 learning rate: 2.50e-03
0002 numerical instability (try 3)
0000 learning rate: 1.25e-03
0002 numerical instability (try 4)
0000 learning rate: 6.25e-04
0002 numerical instability (try 5)
0000 learning rate: 3.12e-04
0002 numerical instability (try 6)
0000 learning rate: 1.56e-04
0002 numerical instability (try 7)
0000 learning rate: 7.81e-05
0002 numerical instability (try 8)
0000 learning rate: 3.91e-05
0002 numerical instability (try 9)
0000 learning rate: 1.95e-05
---------------------------------------------------------------------------
NumericalDivergenceError                  Traceback (most recent call last)
Cell In[4], line 1
----> 1 fs.pp.construct_gems_using_nsf(adata,
      2                             n_gems = 20,
      3                             layer_key = 'count',
      4                             length_scale = 5.0)
      6 commot_output_key = 'commot-cellchat'

File /opt/miniforge/envs/flowsig/lib/python3.8/site-packages/flowsig/preprocessing/_gem_construction.py:88, in construct_gems_using_nsf(adata, n_gems, layer_key, spatial_key, n_inducing_pts, length_scale)
     86 fit.init_loadings(D["Y"], X=Xtr, sz=D["sz"], shrinkage=0.3)
     87 tro = sf.ModelTrainer(fit)
---> 88 tro.train_model(*Dtf, status_freq=50) #about 3 mins
     90 insf = interpret_nsf(fit,Xtr,S=100,lda_mode=False)
     92 adata.uns['nsf_info'] = insf

File /opt/miniforge/envs/flowsig/lib/python3.8/site-packages/spatial_factorization/training.py:310, in ModelTrainer.train_model(self, lr_reduce, maxtry, verbose, ckpt_freq, *args, **kwargs)
    308 except (tf.errors.InvalidArgumentError,NumericalDivergenceError) as err: #cholesky failure
    309   tries+=1
--> 310   if tries==maxtry: raise err
    311   #else: #not yet reached the maximum number of tries
    312   if verbose:

File /opt/miniforge/envs/flowsig/lib/python3.8/site-packages/spatial_factorization/training.py:304, in ModelTrainer.train_model(self, lr_reduce, maxtry, verbose, ckpt_freq, *args, **kwargs)
    302 while tries < maxtry:
    303   try:
--> 304     self._train_model_fixed_lr(mgr, *args, ptic=ptic, wtic=wtic,
    305                                verbose=verbose, ckpt_freq=ckpt_freq,
    306                                **kwargs)
    307     if self.epoch>=len(self.loss["train"])-1: break #finished training
    308   except (tf.errors.InvalidArgumentError,NumericalDivergenceError) as err: #cholesky failure

File /opt/miniforge/envs/flowsig/lib/python3.8/site-packages/spatial_factorization/training.py:232, in ModelTrainer._train_model_fixed_lr(self, ckpt_mgr, Dtrain, Ntr, Dval, S, verbose, num_epochs, ptic, wtic, ckpt_freq, kernel_hp_update_freq, status_freq, span, tol, pickle_freq)
    230 self.loss["train"][i] = trl
    231 if not np.isfinite(trl) or trl>self.loss["train"][1]:
--> 232   raise NumericalDivergenceError
    233 if i%status_freq==0 or i==num_epochs:
    234   if Dval:

NumericalDivergenceError:

My environment is :
absl-py 2.1.0
adjustText 1.2.0
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aiosignal 1.3.1
anndata 0.9.2
annoy 1.17.3
anyio 4.4.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asciitree 0.3.3
asttokens 2.4.1
astunparse 1.6.3
async-lru 2.0.4
async-timeout 4.0.3
attrs 24.2.0
babel 2.16.0
backcall 0.2.0
backports.zoneinfo 0.2.1
beautifulsoup4 4.12.3
biothings-client 0.3.1
bleach 6.1.0
bokeh 3.1.1
cachetools 5.5.0
causaldag 0.1a163
certifi 2024.8.30
cffi 1.17.1
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
colorcet 3.1.0
comm 0.2.2
conditional-independence 0.1a6
contourpy 1.1.1
cycler 0.12.1
dask 2023.5.0
dask-image 2023.3.0
dataclasses 0.6
datashader 0.15.2
datashape 0.5.2
debugpy 1.8.5
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.8
dm-tree 0.1.8
docopt 0.6.2
docrep 0.3.2
einops 0.8.0
et-xmlfile 1.1.0
exceptiongroup 1.2.2
executing 2.1.0
fasteners 0.19
fastjsonschema 2.20.0
filelock 3.16.0
flatbuffers 24.3.25
flowsig 0.1.0
fonttools 4.53.1
fqdn 1.5.1
frozendict 2.4.4
frozenlist 1.4.1
fsspec 2024.9.0
ftpretty 0.4.0
gast 0.4.0
get-annotations 0.1.2
goatools 1.4.12
google-ai-generativelanguage 0.1.0
google-api-core 2.19.2
google-auth 2.34.0
google-auth-oauthlib 1.0.0
google-generativeai 0.1.0rc1
google-pasta 0.2.0
googleapis-common-protos 1.65.0
graphical-model-learning 0.1a8
graphical-models 0.1a21
grpcio 1.66.1
grpcio-status 1.62.3
h11 0.14.0
h5py 3.11.0
h5sparse 0.1.0
holoviews 1.17.1
httpcore 1.0.5
httpx 0.27.2
idna 3.10
igraph 0.11.6
imageio 2.35.1
importlib_metadata 8.5.0
importlib_resources 6.4.5
inflect 7.0.0
ipdb 0.13.13
ipykernel 6.29.5
ipython 8.12.3
ipywidgets 8.1.5
isoduration 20.11.0
jedi 0.19.1
Jinja2 3.1.4
joblib 1.4.2
json5 0.9.25
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
jupyter 1.1.1
jupyter_client 8.6.2
jupyter-console 6.6.3
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.2
jupyter_server_terminals 0.5.3
jupyterlab 4.2.5
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.3
jupyterlab_widgets 3.0.13
keras 2.13.1
kiwisolver 1.4.7
lazy_loader 0.4
leidenalg 0.10.2
libclang 18.1.1
linkify-it-py 2.0.3
llvmlite 0.41.1
locket 1.0.0
louvain 0.8.2
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.7.5
matplotlib-inline 0.1.7
matplotlib-scalebar 0.8.1
mdit-py-plugins 0.4.2
mdurl 0.1.2
mistune 3.0.2
mizani 0.9.3
mpmath 1.3.0
multidict 6.1.0
multipledispatch 1.0.0
mygene 3.2.2
natsort 8.4.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.1
notebook 7.2.2
notebook_shim 0.2.4
numba 0.58.1
numcodecs 0.12.1
numexpr 2.8.6
numpy 1.24.3
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.6.68
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.2
omnipath 1.0.8
openpyxl 3.1.5
opt-einsum 3.3.0
overrides 7.7.0
packaging 24.1
pandas 2.0.3
pandocfilters 1.5.1
panel 1.2.3
param 2.1.1
parso 0.8.4
partd 1.4.1
patsy 0.5.6
pexpect 4.9.0
pgmpy 0.1.26
pickleshare 0.7.5
pillow 10.4.0
PIMS 0.7
pip 24.2
pkgutil_resolve_name 1.3.10
platformdirs 4.3.3
plotnine 0.12.4
progressbar2 4.5.0
prometheus_client 0.20.0
prompt_toolkit 3.0.47
proto-plus 1.24.0
protobuf 4.25.4
psutil 6.0.0
ptyprocess 0.7.0
pure_eval 0.2.3
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycparser 2.22
pyct 0.5.0
pydantic 1.10.18
pydot 3.0.1
pygam 0.9.1
Pygments 2.18.0
pyliger 0.2.0
pynndescent 0.5.13
pyparsing 3.1.4
python-dateutil 2.9.0.post0
python-igraph 0.11.6
python-json-logger 2.0.7
python-utils 3.8.2
pytz 2024.2
pyviz_comms 3.0.3
PyWavelets 1.4.1
PyYAML 6.0.2
pyzmq 26.2.0
referencing 0.35.1
requests 2.32.3
requests-oauthlib 2.0.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.8.1
rpds-py 0.20.0
rsa 4.9
scanpy 1.9.8
scikit-image 0.21.0
scikit-learn 1.3.2
scipy 1.10.1
seaborn 0.13.2
Send2Trash 1.8.3
session_info 1.0.0
setuptools 73.0.1
six 1.16.0
slicerator 1.1.0
sniffio 1.3.1
soupsieve 2.6
spatial-factorization 0.0.1
squidpy 1.2.2
stack-data 0.6.3
statsmodels 0.14.1
stdlib-list 0.10.0
sympy 1.13.2
tensorboard 2.13.0
tensorboard-data-server 0.7.2
tensorflow 2.13.1
tensorflow-estimator 2.13.0
tensorflow-io-gcs-filesystem 0.34.0
tensorflow-probability 0.21.0
termcolor 2.4.0
terminado 0.18.1
texttable 1.7.0
threadpoolctl 3.5.0
tifffile 2023.7.10
tinycss2 1.3.0
tomli 2.0.1
toolz 0.12.1
torch 2.1.2
tornado 6.4.1
tqdm 4.66.5
traitlets 5.14.3
triton 2.1.0
types-python-dateutil 2.9.0.20240906
typing 3.7.4.3
typing_extensions 4.5.0
tzdata 2024.1
uc-micro-py 1.0.3
umap-learn 0.5.6
uri-template 1.3.0
urllib3 2.2.3
validators 0.34.0
wcwidth 0.2.13
webcolors 24.8.0
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.0.4
wheel 0.44.0
widgetsnbextension 4.0.13
wrapt 1.16.0
xarray 2023.1.0
xgboost 2.1.1
XlsxWriter 3.2.0
xyzservices 2024.9.0
yarl 1.11.1
zarr 2.16.1
zipp 3.20.2

The text was updated successfully, but these errors were encountered:

axelalmet · 2024-09-24T15:35:00Z

Hi Dbinary,

I double-checked this example on my laptop and the training appears to converge. I think the issue is to do with tensorflow and its various related packages, e.g. tensorflow-probability. Would you mind updating tensorflow and tensorflow-probability in particular and trying again?

wolfQK · 2024-09-25T09:39:54Z

The value of the trl variable at line 227 in training.py of the spatial_factorization package becomes nan, which may not be related to the version of TensorFlow?

wolfQK · 2024-09-26T01:32:29Z

I tried Python 3.12, TensorFlow 2.17.0, and TensorFlow Probability 0.24.0, and they work. @axelalmet

axelalmet · 2024-09-26T13:21:53Z

Hi wolfQK,

I'm glad you got it to work! It's good for myself to know that NSF works with newer Python versions and TensorFlow. I'll modify the installation requirements accordingly.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NumericalDivergenceError of fs.pp.construct_gems_using_nsf function #13

NumericalDivergenceError of fs.pp.construct_gems_using_nsf function #13

DBinary commented Sep 17, 2024 •

edited

Loading

axelalmet commented Sep 24, 2024

wolfQK commented Sep 25, 2024

wolfQK commented Sep 26, 2024

axelalmet commented Sep 26, 2024

NumericalDivergenceError of fs.pp.construct_gems_using_nsf function #13

NumericalDivergenceError of fs.pp.construct_gems_using_nsf function #13

Comments

DBinary commented Sep 17, 2024 • edited Loading

axelalmet commented Sep 24, 2024

wolfQK commented Sep 25, 2024

wolfQK commented Sep 26, 2024

axelalmet commented Sep 26, 2024

DBinary commented Sep 17, 2024 •

edited

Loading