High Load Issue: dbcan_sub Creating Excessive Threads #151

trx296554555 · 2024-01-14T08:17:35Z

Thank you for your hard work and recent updates. However, I wanted to bring to your attention an ongoing issue with the latest version of run_dbcan-4.1.1. When processing large input sequence files, dbcan_sub tends to create an excessive number of threads, resulting in high system load. #117

This issue persists even when specifying parameters such as --dbcan_thread and --hmm_cpu, as there seems to be no effective limitation on the number of threads being created.

After reviewing the code of run_dbcan.py, I have identified that the issue lies within the function split_uniInput. This section of code directly launches as many subprocesses as the number of small files generated by splitting the large input file.https://github.com/linnabrown/run_dbcan/blob/707aed21a0ef455828126f1afb5820963e8274ca/dbcan/cli/run_dbcan.py#L139C1-L157C22

I made modifications to this specific code section to prevent excessive load when I used it myself. I implemented a simple ThreadPool, but I'm unsure if this could potentially affect other parts of the program. Therefore, I offer it as a reference only.

from concurrent.futures import ThreadPoolExecutor, as_completed

def run_command(cmd):
    hmmer = Popen(cmd)
    hmmer.wait()
    return cmd
    
max_workers = dbcan_thread  
cmds = []
for j in split_files:
    cmds.append(["hmmsearch", "--domtblout", f"{outPath}d{j}", "--cpu", "2", "-o", "/dev/null",
                 f"{dbDir}dbCAN_sub.hmm", f"{outPath}{j}"])

with ThreadPoolExecutor(max_workers=max_workers) as executor:
    futures = [executor.submit(run_command, cmd) for cmd in cmds]
    for future in as_completed(futures):
        try:
            command = future.result()
            print(f"Command: {' '.join(command)} already completed.")
        except Exception as e:
            print(f"An error occurred: {e}")

Best,
Robin

The text was updated successfully, but these errors were encountered:

linnabrown · 2024-01-14T16:49:30Z

Hi Robin,

Thank you so much for bringing this out. We previously utilized this manner due to hmmscan does not support multithreads but hmmsearch does. Therefore, we will remove the multi-processing part and just use the multi-threading butil-in function from hmmsearch.

Let me just delete and test codes and I will put 4.1.2 version. Thank you so much!

Best,
Le

HaidYi · 2024-01-16T17:17:37Z

Our 4.1.2 version is already issued. Problem solved.

HaidYi closed this as completed Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Load Issue: dbcan_sub Creating Excessive Threads #151

High Load Issue: dbcan_sub Creating Excessive Threads #151

trx296554555 commented Jan 14, 2024

linnabrown commented Jan 14, 2024

HaidYi commented Jan 16, 2024

High Load Issue: dbcan_sub Creating Excessive Threads #151

High Load Issue: dbcan_sub Creating Excessive Threads #151

Comments

trx296554555 commented Jan 14, 2024

linnabrown commented Jan 14, 2024

HaidYi commented Jan 16, 2024