loss: nan, iter:1/455(1, 1.076s) #8

thuxugang · 2016-11-15T07:49:39Z

hello,请问为什么我使用您的程序ctc loss从一开始就为nan呢？希望您指导一下，非常感谢~
下面是显示的内容：
Using gpu device 0: GeForce GTX 980 Ti (CNMeM is disabled, cuDNN not available)
C:\Anaconda\lib\site-packages\theano\tensor\signal\downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
"downsample module has been moved to the theano.tensor.signal.pool module.")
loaded 29143 samples from D:\xugang\OCR\cnn-lstm-ctc-master\dataset\english_sentence\train_img_list.txt
loaded 2914 samples from D:\xugang\OCR\cnn-lstm-ctc-master\dataset\english_sentence\val_img_list.txt
building symbolic tensors(0.0799999237061)
setting parameters(0.0799999237061)
('n_classes: ', 95)
('multi-step: ', set([79625, 68250, 45500]))
building the model(0.0799999237061)
computing updates and function(0.240000009537)
using normal sgd and learning_rate:0.00999999977648
('bw_lstm_b', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('fw_lstm_W', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('fw_lstm_U', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('fw_lstm_b', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('bw_lstm_W', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('bw_lstm_U', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('hidden_b', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
('hidden_W', <class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>)
building training function(1.78999996185)
building validating function(29.6099998951)
begin to train(32.8609998226)
.epoch 1/200 begin(32.861)
[prefetch]height: 28, x_max_step:141.0, y_max_width:50
D:\xugang\OCR\cnn-lstm-ctc-master - 1.0\layers\utee.py:137: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
x = np.zeros((batch_size, 1, height, x_max_len)). astype(config.floatX)
D:\xugang\OCR\cnn-lstm-ctc-master - 1.0\layers\utee.py:138: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
x_mask = np.zeros((batch_size, x_max_len)).astype(config.floatX)
..loss: nan, iter:1/455(1, 1.076s)
..detect nan
..loss: nan, iter:1/455(1.076)
Traceback (most recent call last):
File "D:\xugang\OCR\cnn-lstm-ctc-master - 1.0\train.py", line 150, in
sys.exit()
SystemExit

aaron-xichen · 2016-11-16T03:10:35Z

Maybe you can try this
[global]
device=gpu0
floatX=float32
Btw, please remember to backup your original .theanorc

thuxugang · 2016-11-16T06:07:00Z

感谢您的指导，刚试了一下，还是不行。请问您那里可以么？
我的配置是：
[global]
openmp = False
device = gpu
floatX = float32
allow_input_downcast=True
[blas]
ldflags =
[gcc]
cxxflags = -IC:\Anaconda\MinGW
[nvcc]
flags = -LC:\Anaconda\libs
compiler_bindir = C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin
fastmath = True

thuxugang · 2016-11-16T12:06:32Z

对了，我使用的是opencv 2.4.11，请问这个有影响么。。。谢谢

aaron-xichen · 2016-12-07T14:53:41Z

Sorry for the late reply, please set fastmath = False and try again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss: nan, iter:1/455(1, 1.076s) #8

loss: nan, iter:1/455(1, 1.076s) #8

thuxugang commented Nov 15, 2016

aaron-xichen commented Nov 16, 2016

thuxugang commented Nov 16, 2016

thuxugang commented Nov 16, 2016

aaron-xichen commented Dec 7, 2016

loss: nan, iter:1/455(1, 1.076s) #8

loss: nan, iter:1/455(1, 1.076s) #8

Comments

thuxugang commented Nov 15, 2016

aaron-xichen commented Nov 16, 2016

thuxugang commented Nov 16, 2016

thuxugang commented Nov 16, 2016

aaron-xichen commented Dec 7, 2016