Parallelize file parsing #247

YuriRomanowski · 2022-12-16T13:54:46Z

Clarification and motivation

This topic is a part of #221.
After we read file contents, we should perform parsing of the files, which is (in theory) pure action and can be parallelized.
But we use C library under the hood, so the parallelization may be tricky. Here we can try some approaches and discuss results.

Acceptance criteria

Some approaches of parallelization are tried
We decided how to handle foreign calls during parsing for parallelization
Speedup is obtained and proved with measurements

YuriRomanowski · 2022-12-16T15:26:27Z

I uploaded some commits where different variations of xrefcheck can be load-tested (in branch YuriRomanowski/#247-parallelize-file-parsing-scaffolding):

Original version (from master) with lazy readFile: 2a959d0
Replace lazy readFile with strict one: b3368c3
Force reading files and then process them in parallel using Eval monad: a1d5f56
Force reading files and then process them using mapConcurrently: 8f65374
The latter two ones produce similar results.

Martoon-00 · 2023-01-31T18:52:48Z

Thanks for this investigation!

I tried, and from what I can see:

Repo scanning time is not extremely different in all of these scenarios (0.9s / 0.7s / 0.5s / 0.5s)
My impression was that in the given load testing there was simply no space for parallelization (this is what we saw on this picture. Sparks tab shows that a few sparks were bind to different cores, but most of them went to one core, probably simply because parsing was fast enough to process it all.
I tried to create 4 dummy markdown files, 50Kb each, and sparks solution showed 4 cores being used.

(the selected area corresponds to repo scanning time)

Although I'm not exactly sure why "Activity" graph at the top shows so few CPU feed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize file parsing #247

Parallelize file parsing #247

YuriRomanowski commented Dec 16, 2022 •

edited

Loading

YuriRomanowski commented Dec 16, 2022

Martoon-00 commented Jan 31, 2023

Parallelize file parsing #247

Parallelize file parsing #247

Comments

YuriRomanowski commented Dec 16, 2022 • edited Loading

Clarification and motivation

Acceptance criteria

YuriRomanowski commented Dec 16, 2022

Martoon-00 commented Jan 31, 2023

YuriRomanowski commented Dec 16, 2022 •

edited

Loading