Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumericalDivergenceError of fs.pp.construct_gems_using_nsf function #13

Open
DBinary opened this issue Sep 17, 2024 · 4 comments
Open

Comments

@DBinary
Copy link

DBinary commented Sep 17, 2024

Hi, when I run the tutorials of flowsig with default parameters and python 3.8 environment, I have the following bug:

Temporary checkpoint directory: /tmp/tmpv1t2fq2f
0002 numerical instability (try 1)
0000 learning rate: 5.00e-03
0002 numerical instability (try 2)
0000 learning rate: 2.50e-03
0002 numerical instability (try 3)
0000 learning rate: 1.25e-03
0002 numerical instability (try 4)
0000 learning rate: 6.25e-04
0002 numerical instability (try 5)
0000 learning rate: 3.12e-04
0002 numerical instability (try 6)
0000 learning rate: 1.56e-04
0002 numerical instability (try 7)
0000 learning rate: 7.81e-05
0002 numerical instability (try 8)
0000 learning rate: 3.91e-05
0002 numerical instability (try 9)
0000 learning rate: 1.95e-05
---------------------------------------------------------------------------
NumericalDivergenceError                  Traceback (most recent call last)
Cell In[4], line 1
----> 1 fs.pp.construct_gems_using_nsf(adata,
      2                             n_gems = 20,
      3                             layer_key = 'count',
      4                             length_scale = 5.0)
      6 commot_output_key = 'commot-cellchat'

File /opt/miniforge/envs/flowsig/lib/python3.8/site-packages/flowsig/preprocessing/_gem_construction.py:88, in construct_gems_using_nsf(adata, n_gems, layer_key, spatial_key, n_inducing_pts, length_scale)
     86 fit.init_loadings(D["Y"], X=Xtr, sz=D["sz"], shrinkage=0.3)
     87 tro = sf.ModelTrainer(fit)
---> 88 tro.train_model(*Dtf, status_freq=50) #about 3 mins
     90 insf = interpret_nsf(fit,Xtr,S=100,lda_mode=False)
     92 adata.uns['nsf_info'] = insf

File /opt/miniforge/envs/flowsig/lib/python3.8/site-packages/spatial_factorization/training.py:310, in ModelTrainer.train_model(self, lr_reduce, maxtry, verbose, ckpt_freq, *args, **kwargs)
    308 except (tf.errors.InvalidArgumentError,NumericalDivergenceError) as err: #cholesky failure
    309   tries+=1
--> 310   if tries==maxtry: raise err
    311   #else: #not yet reached the maximum number of tries
    312   if verbose:

File /opt/miniforge/envs/flowsig/lib/python3.8/site-packages/spatial_factorization/training.py:304, in ModelTrainer.train_model(self, lr_reduce, maxtry, verbose, ckpt_freq, *args, **kwargs)
    302 while tries < maxtry:
    303   try:
--> 304     self._train_model_fixed_lr(mgr, *args, ptic=ptic, wtic=wtic,
    305                                verbose=verbose, ckpt_freq=ckpt_freq,
    306                                **kwargs)
    307     if self.epoch>=len(self.loss["train"])-1: break #finished training
    308   except (tf.errors.InvalidArgumentError,NumericalDivergenceError) as err: #cholesky failure

File /opt/miniforge/envs/flowsig/lib/python3.8/site-packages/spatial_factorization/training.py:232, in ModelTrainer._train_model_fixed_lr(self, ckpt_mgr, Dtrain, Ntr, Dval, S, verbose, num_epochs, ptic, wtic, ckpt_freq, kernel_hp_update_freq, status_freq, span, tol, pickle_freq)
    230 self.loss["train"][i] = trl
    231 if not np.isfinite(trl) or trl>self.loss["train"][1]:
--> 232   raise NumericalDivergenceError
    233 if i%status_freq==0 or i==num_epochs:
    234   if Dval:

NumericalDivergenceError:

My environment is :
absl-py 2.1.0
adjustText 1.2.0
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aiosignal 1.3.1
anndata 0.9.2
annoy 1.17.3
anyio 4.4.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asciitree 0.3.3
asttokens 2.4.1
astunparse 1.6.3
async-lru 2.0.4
async-timeout 4.0.3
attrs 24.2.0
babel 2.16.0
backcall 0.2.0
backports.zoneinfo 0.2.1
beautifulsoup4 4.12.3
biothings-client 0.3.1
bleach 6.1.0
bokeh 3.1.1
cachetools 5.5.0
causaldag 0.1a163
certifi 2024.8.30
cffi 1.17.1
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
colorcet 3.1.0
comm 0.2.2
conditional-independence 0.1a6
contourpy 1.1.1
cycler 0.12.1
dask 2023.5.0
dask-image 2023.3.0
dataclasses 0.6
datashader 0.15.2
datashape 0.5.2
debugpy 1.8.5
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.8
dm-tree 0.1.8
docopt 0.6.2
docrep 0.3.2
einops 0.8.0
et-xmlfile 1.1.0
exceptiongroup 1.2.2
executing 2.1.0
fasteners 0.19
fastjsonschema 2.20.0
filelock 3.16.0
flatbuffers 24.3.25
flowsig 0.1.0
fonttools 4.53.1
fqdn 1.5.1
frozendict 2.4.4
frozenlist 1.4.1
fsspec 2024.9.0
ftpretty 0.4.0
gast 0.4.0
get-annotations 0.1.2
goatools 1.4.12
google-ai-generativelanguage 0.1.0
google-api-core 2.19.2
google-auth 2.34.0
google-auth-oauthlib 1.0.0
google-generativeai 0.1.0rc1
google-pasta 0.2.0
googleapis-common-protos 1.65.0
graphical-model-learning 0.1a8
graphical-models 0.1a21
grpcio 1.66.1
grpcio-status 1.62.3
h11 0.14.0
h5py 3.11.0
h5sparse 0.1.0
holoviews 1.17.1
httpcore 1.0.5
httpx 0.27.2
idna 3.10
igraph 0.11.6
imageio 2.35.1
importlib_metadata 8.5.0
importlib_resources 6.4.5
inflect 7.0.0
ipdb 0.13.13
ipykernel 6.29.5
ipython 8.12.3
ipywidgets 8.1.5
isoduration 20.11.0
jedi 0.19.1
Jinja2 3.1.4
joblib 1.4.2
json5 0.9.25
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
jupyter 1.1.1
jupyter_client 8.6.2
jupyter-console 6.6.3
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.2
jupyter_server_terminals 0.5.3
jupyterlab 4.2.5
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.3
jupyterlab_widgets 3.0.13
keras 2.13.1
kiwisolver 1.4.7
lazy_loader 0.4
leidenalg 0.10.2
libclang 18.1.1
linkify-it-py 2.0.3
llvmlite 0.41.1
locket 1.0.0
louvain 0.8.2
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.7.5
matplotlib-inline 0.1.7
matplotlib-scalebar 0.8.1
mdit-py-plugins 0.4.2
mdurl 0.1.2
mistune 3.0.2
mizani 0.9.3
mpmath 1.3.0
multidict 6.1.0
multipledispatch 1.0.0
mygene 3.2.2
natsort 8.4.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.1
notebook 7.2.2
notebook_shim 0.2.4
numba 0.58.1
numcodecs 0.12.1
numexpr 2.8.6
numpy 1.24.3
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.6.68
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.2
omnipath 1.0.8
openpyxl 3.1.5
opt-einsum 3.3.0
overrides 7.7.0
packaging 24.1
pandas 2.0.3
pandocfilters 1.5.1
panel 1.2.3
param 2.1.1
parso 0.8.4
partd 1.4.1
patsy 0.5.6
pexpect 4.9.0
pgmpy 0.1.26
pickleshare 0.7.5
pillow 10.4.0
PIMS 0.7
pip 24.2
pkgutil_resolve_name 1.3.10
platformdirs 4.3.3
plotnine 0.12.4
progressbar2 4.5.0
prometheus_client 0.20.0
prompt_toolkit 3.0.47
proto-plus 1.24.0
protobuf 4.25.4
psutil 6.0.0
ptyprocess 0.7.0
pure_eval 0.2.3
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycparser 2.22
pyct 0.5.0
pydantic 1.10.18
pydot 3.0.1
pygam 0.9.1
Pygments 2.18.0
pyliger 0.2.0
pynndescent 0.5.13
pyparsing 3.1.4
python-dateutil 2.9.0.post0
python-igraph 0.11.6
python-json-logger 2.0.7
python-utils 3.8.2
pytz 2024.2
pyviz_comms 3.0.3
PyWavelets 1.4.1
PyYAML 6.0.2
pyzmq 26.2.0
referencing 0.35.1
requests 2.32.3
requests-oauthlib 2.0.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.8.1
rpds-py 0.20.0
rsa 4.9
scanpy 1.9.8
scikit-image 0.21.0
scikit-learn 1.3.2
scipy 1.10.1
seaborn 0.13.2
Send2Trash 1.8.3
session_info 1.0.0
setuptools 73.0.1
six 1.16.0
slicerator 1.1.0
sniffio 1.3.1
soupsieve 2.6
spatial-factorization 0.0.1
squidpy 1.2.2
stack-data 0.6.3
statsmodels 0.14.1
stdlib-list 0.10.0
sympy 1.13.2
tensorboard 2.13.0
tensorboard-data-server 0.7.2
tensorflow 2.13.1
tensorflow-estimator 2.13.0
tensorflow-io-gcs-filesystem 0.34.0
tensorflow-probability 0.21.0
termcolor 2.4.0
terminado 0.18.1
texttable 1.7.0
threadpoolctl 3.5.0
tifffile 2023.7.10
tinycss2 1.3.0
tomli 2.0.1
toolz 0.12.1
torch 2.1.2
tornado 6.4.1
tqdm 4.66.5
traitlets 5.14.3
triton 2.1.0
types-python-dateutil 2.9.0.20240906
typing 3.7.4.3
typing_extensions 4.5.0
tzdata 2024.1
uc-micro-py 1.0.3
umap-learn 0.5.6
uri-template 1.3.0
urllib3 2.2.3
validators 0.34.0
wcwidth 0.2.13
webcolors 24.8.0
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.0.4
wheel 0.44.0
widgetsnbextension 4.0.13
wrapt 1.16.0
xarray 2023.1.0
xgboost 2.1.1
XlsxWriter 3.2.0
xyzservices 2024.9.0
yarl 1.11.1
zarr 2.16.1
zipp 3.20.2

@axelalmet
Copy link
Owner

Hi Dbinary,

I double-checked this example on my laptop and the training appears to converge. I think the issue is to do with tensorflow and its various related packages, e.g. tensorflow-probability. Would you mind updating tensorflow and tensorflow-probability in particular and trying again?

@wolfQK
Copy link

wolfQK commented Sep 25, 2024

The value of the trl variable at line 227 in training.py of the spatial_factorization package becomes nan, which may not be related to the version of TensorFlow?

@wolfQK
Copy link

wolfQK commented Sep 26, 2024

I tried Python 3.12, TensorFlow 2.17.0, and TensorFlow Probability 0.24.0, and they work. @axelalmet

@axelalmet
Copy link
Owner

Hi wolfQK,

I'm glad you got it to work! It's good for myself to know that NSF works with newer Python versions and TensorFlow. I'll modify the installation requirements accordingly.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants