How to change batch size ? #55

hiwasawa0715 · 2019-08-25T17:13:29Z

I tried to run "train.py"(process 5 train at "usage").
But, I get the following "ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape" error.

Shall I change the batch size to be small?
How to change batch size ?

[my environment]
gpu Geforce1050 ti
memory 16 GiB
swap 16 GiB
ubuntu 16.04
cuda 9.0
cudnn 7.5.0
[anaconda3]
python 3.6
tensorflow-gpu 1.12.0
scikit-learn 0.21.3
open3d-python 0.7.0.0

2019-08-25 14:54:06.964697: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *******************************xx***
2019-08-25 14:54:06.964735: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_ops.cc:446 : Resource exhausted: OOM when allocating tensor with shape[16,128,8192,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,128,8192,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node fa_layer4/conv_2/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fa_layer4/conv_1/Relu, fa_layer4/conv_2/weights/read/_211)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad/_433}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4838_gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 469, in
train()
File "train.py", line 437, in train
train_one_epoch(sess, ops, train_writer, stack_train)
File "train.py", line 243, in train_one_epoch
feed_dict=feed_dict,
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,128,8192,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node fa_layer4/conv_2/Conv2D (defined at /home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/util/tf_util.py:186) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fa_layer4/conv_1/Relu, fa_layer4/conv_2/weights/read/_211)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad/_433}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4838_gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'fa_layer4/conv_2/Conv2D', defined at:
File "train.py", line 469, in
train()
File "train.py", line 359, in train
bn_decay=bn_decay,
File "/home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/model.py", line 128, in get_model
scope="fa_layer4",
File "/home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/util/pointnet_util.py", line 323, in pointnet_fp_module
bn_decay=bn_decay,
File "/home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/util/tf_util.py", line 186, in conv2d
data_format=data_format,
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 957, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,128,8192,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node fa_layer4/conv_2/Conv2D (defined at /home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/util/tf_util.py:186) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fa_layer4/conv_1/Relu, fa_layer4/conv_2/weights/read/_211)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad/_433}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4838_gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

windcatcher · 2019-10-19T16:47:19Z

me too

windcatcher · 2019-10-21T12:44:46Z

my deviceis gtx 1050. i soled it by change the batch size to 1 ,the param is in the file of semantic.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to change batch size ? #55

How to change batch size ? #55

hiwasawa0715 commented Aug 25, 2019

windcatcher commented Oct 19, 2019

windcatcher commented Oct 21, 2019 •

edited

Loading

How to change batch size ? #55

How to change batch size ? #55

Comments

hiwasawa0715 commented Aug 25, 2019

windcatcher commented Oct 19, 2019

windcatcher commented Oct 21, 2019 • edited Loading

windcatcher commented Oct 21, 2019 •

edited

Loading