Map is slow and processes batches one after another

## Describe the bug

I have a somewhat unclear bug to me, where I can't figure out what the problem is. The code works as expected on a small subset of my dataset (2000 samples) on my local machine, but when I execute the same code with a larger dataset (1.4 million samples) this problem occurs. Thats why I can't give exact steps to reproduce, I'm sorry. 

I process a large dataset in a two step process. I first call map on a dataset I load from disk and create a new dataset from it. This works like expected and `map` uses all workers I started it with. Then I process the dataset created by the first step, again with `map`, which is really slow and starting only one or two process at a time. Number of processes is the same for both steps.

pseudo code:
```python
ds = datasets.load_from_disk("path")
new_dataset = ds.map(work, batched=True, ...)  # fast uses all processes
final_dataset = new_dataset.map(work2, batched=True, ...)  # slow starts one process after another
```

## Expected results
Second stage should be as fast as the first stage.

## Versions
Paste the output of the following code:
- Datasets: 1.5.0
- Python: 3.8.8 (default, Feb 24 2021, 21:46:12)
- Platform: Linux-5.4.0-60-generic-x86_64-with-glibc2.10    

Do you guys have any idea? Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Map is slow and processes batches one after another #2243

Describe the bug

Expected results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Map is slow and processes batches one after another #2243

Description

Describe the bug

Expected results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions