Train DeepSpeech on Common Voice dataset gives error on gpu
I'm trying to train DeepSpeech model on Common Voice dataset as it's stated in documentation. But it gives the following error:
I0421 11:34:32.779112 140581195995008 utils.py:157] NumExpr defaulting to 2 threads.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn
self._extend_graph()
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
<no registered kernels>
[[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/DeepSpeech/DeepSpeech.py", line 12, in <module>
ds_train.run_script()
File "/content/DeepSpeech/training/deepspeech_training/train.py", line 982, in run_script
absl.app.run(main)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/content/DeepSpeech/training/deepspeech_training/train.py", line 954, in main
train()
File "/content/DeepSpeech/training/deepspeech_training/train.py", line 529, in train
load_or_init_graph_for_training(session)
File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 137, in load_or_init_graph_for_training
_load_or_init_impl(session, methods, allow_drop_layers=True)
File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 112, in _load_or_init_impl
return _initialize_all_variables(session)
File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 88, in _initialize_all_variables
session.run(v.initializer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
<no registered kernels>
[[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
My local machine spec is as follows:
python 3.7; Cuda 10.1; CuDNN 7.6.5; tensorflow-gpu 1.15.2; GPU GTX 1050 ti
I'm also installing the following packages and libraries to prepare the environment:
!apt-add-repository universe
!apt-get install sox libsox-fmt-mp3 cmake libblkid-dev e2fslibs-dev libboost-all-dev libaudit-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev
!python3.7 -m pip install sox
!python3.7 -m pip install deepspeech-gpu
!python3.7 -m pip install tensorflow-gpu==1.15.2
!python3.7 -m pip install numpy==1.19.5
!python3.7 -m pip install progressbar2
!python3.7 -m pip install progressbar
!python3.7 -m pip install progressbar33
!python3.7 -m pip install ds_ctcdecoder==0.10.0-alpha.3
!python3.7 -m pip install pyogg==0.6.14a1
!python3.7 -m pip install deepspeech
!git clone --branch v0.9.3 https://github.com/mozilla/DeepSpeech
!python3.7 -m pip install --upgrade --force-reinstall -e ./DeepSpeech/
!git clone https://github.com/kpu/kenlm.git
!mkdir -p build
!cmake kenlm
!make -j 4
!wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-checkpoint.tar.gz
!curl -LO "https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/native_client.amd64.cuda.linux.tar.xz"
!mkdir native_client
!tar xvf native_client.amd64.cuda.linux.tar.xz -C native_client
I'm having the same problem both on my local machine and on google colab vm.
EDIT: I also changed my cuda and cudnn versions to 10.0 and 7.5.6, respectively. But the error already exists.
from Recent Questions - Stack Overflow https://ift.tt/3xn56v0
https://ift.tt/eA8V8J
Comments
Post a Comment