共计 3386 个字符,预计需要花费 9 分钟才能阅读完成。
1 报错形容 1.1 零碎环境 Hardware Environment(Ascend/GPU/CPU): AscendSoftware Environment:– MindSpore version (source or binary): 1.8.0– Python version (e.g., Python 3.7.5): 3.7.6– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic– GCC/Compiler version (if compiled from source):1.2 根本信息 1.2.1 脚本训练脚本是通过构建 CTCGreedyDecoder 的单算子网络,对输出中给出的 logits 执行贪心解码(最佳门路)。脚本如下:01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.ctc_greedyDecoder = ops.CTCGreedyDecoder()
05
06 def construct(self, input_x, sequence_length):
07 return self.ctc_greedyDecoder(input_x, sequence_length)
08 net = Net()
09
10
11 inputs = Tensor(np.array([[[0.6, 0.4, 0.2], [0.8, 0.6, 0.3]],
12 [[0.0, 0.6, 0.0], [0.5, 0.4, 0.5]]]), mindspore.float32)
13 sequence_length = Tensor(np.array([4, 2]), mindspore.int32)
14
15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
16 print(decoded_indices, decoded_values, decoded_shape, log_probability)
1.2.2 报错这里报错信息如下:[ERROR] DEVICE(172230,fffeae7fc160,python):2022-06-28-07:02:12.636.101 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:603] TaskFailCallback] Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr
Traceback (most recent call last):
File “CTCGreedyDecoder.py”, line 26, in <module>
decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
File “/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py”, line 573, in call
out = self.compile_and_run(*args)
File “/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py”, line 979, in compile_and_run
return _cell_graph_executor(self, *new_inputs, phase=self.phase)
File “/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py”, line 1128, in call
return self.run(obj, *args, phase=phase)
File “/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py”, line 1165, in run
return self._exec_pip(obj, *args, phase=phase_real)
File “/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py”, line 94, in wrapper
results = fn(*arg, **kwargs)
File “/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py”, line 1147, in _exec_pip
return self._graph_executor(args, phase)
RuntimeError: Call runtime rtStreamSynchronize failed. Op name: Default/CTCGreedyDecoder-op2
起因剖析咱们看报错信息,在 Error 中,写到 Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr,尽管从这句报错里不能十分明确的发现问题处在哪个中央,这时候能够提取外面的关键词进行猜测验证,外面呈现了一个 nullptr,可能是呈现了越界导致的。再认真查看下官网对各参数的形容,
联合脚本第 13 行发现这个条件不被满足,因而报错。2 解决办法基于下面已知的起因,很容易做出如下批改:01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.ctc_greedyDecoder = ops.CTCGreedyDecoder()
05
06 def construct(self, input_x, sequence_length):
07 return self.ctc_greedyDecoder(input_x, sequence_length)
08 net = Net()
09
10
11 inputs = Tensor(np.array([[[0.6, 0.4, 0.2], [0.8, 0.6, 0.3]],
12 [[0.0, 0.6, 0.0], [0.5, 0.4, 0.5]]]), mindspore.float32)
13 sequence_length = Tensor(np.array([2, 2]), mindspore.int32)
14
15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
16 print(decoded_indices, decoded_values, decoded_shape, log_probability)
此时执行胜利,输入如下:[[0 0]
[0 1]
[1 0]] [0 1 0] [2 2] [[-1.2]
[-1.3]]
3 总结定位报错问题的步骤:1、找到报错的用户代码行:15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length);2、依据日志报错信息中的关键字,放大剖析问题的范畴 Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr ;4 参考文档 4.1 CTCGreedyDecoder 算子 API 接口