Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we use BackgroundGenerator when we've had DataLoader? #5

Open
yzhang1918 opened this issue Apr 29, 2019 · 6 comments
Open

Should we use BackgroundGenerator when we've had DataLoader? #5

yzhang1918 opened this issue Apr 29, 2019 · 6 comments

Comments

@yzhang1918
Copy link

I really enjoy this guide! However, I am not sure what the advantage of prefetch_generator is. It seems that DataLoader in pytorch has already supported prefetching.

Thank you!

@IgorSusmelj
Copy link
Owner

To the best of my knowledge, the DataLoader in Pytorch is creating a set of worker threads which all prefetch new data at once when all workers are empty.

So if for example, you create 8 worker threads:

  1. All 8 threads prefetch data
  2. Until you empty all of them (make for example 8 train iterations) none of the workers fetches new data

Using the prefetch generator we make sure that each of those workers always has at least 1 additional data item loaded.

You can see this behavior if you create a very shallow network.

I have here two colab notebooks (based on the CIFAR10 example from the official tutorial):

Here with data loader and 2 workers: https://colab.research.google.com/drive/10wJIfCw5moPc-Yx9rSqWFEXkNceAOPpc

Here with the additional prefetch_generator:
https://colab.research.google.com/drive/1WQ8c-RIZ7FMhfsm8dtRpsqiIR_KuZ49Z

Output without prefetch_generator Output with prefetch_generator
Compute efficiency: 0.09, iter 1 Compute efficiency: 0.61, iter 1
Compute efficiency: 0.98, iter 2 Compute efficiency: 0.99, iter 2
Compute efficiency: 0.61, iter 3 Compute efficiency: 0.98, iter 3
Compute efficiency: 0.98, iter 4 Compute efficiency: 0.99, iter 4
Compute efficiency: 0.67, iter 5 Compute efficiency: 0.99, iter 5
Compute efficiency: 0.71, iter 6 Compute efficiency: 0.99, iter 6
Avg time per epoch: 328ms Avg time per epoch: 214ms

This is why keeping track of computing vs data loading time (aka compute efficiency) is important. In this simple example, we even save lots of training time.

If anyone knows how to fix this behavior in the PyTorch data loader let me know :)

@yzhang1918
Copy link
Author

Thank you for your wonderful example!
Now I use the following class to replace the default DataLoader everywhere in my code. XD

from torch.utils.data import DataLoader
from prefetch_generator import BackgroundGenerator

class DataLoaderX(DataLoader):

    def __iter__(self):
        return BackgroundGenerator(super().__iter__())

@ryul99
Copy link

ryul99 commented Jun 23, 2020

I had a problem using BackgroundGenerator with PyTorch Distributed Data Parallel.
When I turn DDP and BackgroundGenerator both on and iterate dataloader, processes that are not in rank 0 loaded something to rank 0 GPU.
I solved this issue by turning off BackgroundGenerator when I use DDP.

@ppwwyyxx
Copy link

the DataLoader in Pytorch is creating a set of worker threads

Technically no, it creates worker processes

Until you empty all of them (make for example 8 train iterations) none of the workers fetches new data

pytorch does not do this

I have here two colab notebooks (based on the CIFAR10 example from the official tutorial):
Here with data loader and 2 workers: https://colab.research.google.com/drive/10wJIfCw5moPc-Yx9rSqWFEXkNceAOPpc
Here with the additional prefetch_generator:

This is a flawed benchmark that doesn't actually show the importance of prefetching -- it runs the fastest without any prefetching: when setting num_workers=0 and do NOT use BackgroundGenerator, it prints 150ms, faster than what's in both colab notebook.

@IgorSusmelj
Copy link
Owner

A quick update on this one. PyTorch 1.7 introduced a configurable prefetching parameter for the DataLoader https://pytorch.org/docs/stable/data.html

I didn't do any benchmarking yet. But I can imagine that the integrated prefetching makes this prefetch_generator obsolete for PyTorch.

IgorSusmelj added a commit that referenced this issue Jan 11, 2021
Add section "How Lightly Works" to "Getting Started"
@DZ9
Copy link

DZ9 commented Mar 16, 2021

I had a problem using BackgroundGenerator with PyTorch Distributed Data Parallel.
When I turn DDP and BackgroundGenerator both on and iterate dataloader, processes that are not in rank 0 loaded something to rank 0 GPU.
I solved this issue by turning off BackgroundGenerator when I use DDP.

I got exactly the same problem. But thurning off BackgroundGenerator in DDP would make the data sample phase much slower. Is there any better solutions for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants