TPU Estimator Crashing #12

captain-pool · 2019-06-20T17:56:46Z

Tensorflow version: tensorflow==2.0.0b0
Tensorflow Datasets Version: tfds-nightly==1.0.2.dev201906090105
Tensorflow Hub Version: tf-hub-nightly==0.5.0.dev201905270046

Issue

Code Raises
End of sequence [[node input_pipeline_task0/while/IteratorGetNext (defined at image_retraining_tpu.py:139) ]]
for All values of max_steps in TPUEstimator.train(...)

Reproduce the issue

$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=8

The Same error rises for

--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=4

$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=100

$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=500

$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=1000

Line 139

GSOC/E1_TPU_Sample/image_retraining_tpu.py

Lines 135 to 139 in 513a0ec

    
           classifier.train( 
        
               input_fn=lambda params: input_fn( 
        
                   mode=tf.estimator.ModeKeys.TRAIN, 
        
                   **params), 
        
               max_steps=FLAGS.max_steps)

Log file

Error starts from Line 230 of output.log
output.log

CC: @srjoglekar246 @vbardiovskyg

The text was updated successfully, but these errors were encountered:

srjoglekar246 · 2019-06-21T18:36:59Z

This looks likes a bug with the TPUEstimator. As far as I understand this part of the docs, the Estimator API handles the OutofRange error from the input data function by stopping iterations (and not raising an exception). TPUEstimator doesn't seem to behave that way yet.
Can you open an issue on TF to cross-check?
Also, does the script work with the try...except block?

captain-pool · 2019-06-22T03:11:30Z

Nope it doesn't. Actually, weirdly enough the code doesn't stop running. It keeps on saying that TPU is Healthy and tries to refresh the token and Doesn't break out, even if there's no more code to execute.

captain-pool mentioned this issue Jun 25, 2019

[TF 2.0] TPU Estimator cannot function without steps or max_steps tensorflow/tensorflow#30148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPU Estimator Crashing #12

TPU Estimator Crashing #12

captain-pool commented Jun 20, 2019

srjoglekar246 commented Jun 21, 2019

captain-pool commented Jun 22, 2019 •

edited

Loading

TPU Estimator Crashing #12

TPU Estimator Crashing #12

Comments

captain-pool commented Jun 20, 2019

Issue

Reproduce the issue

The Same error rises for

Line 139

Log file

srjoglekar246 commented Jun 21, 2019

captain-pool commented Jun 22, 2019 • edited Loading

captain-pool commented Jun 22, 2019 •

edited

Loading