You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your hard work and recent updates. However, I wanted to bring to your attention an ongoing issue with the latest version of run_dbcan-4.1.1. When processing large input sequence files, dbcan_sub tends to create an excessive number of threads, resulting in high system load. #117
This issue persists even when specifying parameters such as --dbcan_thread and --hmm_cpu, as there seems to be no effective limitation on the number of threads being created.
I made modifications to this specific code section to prevent excessive load when I used it myself. I implemented a simple ThreadPool, but I'm unsure if this could potentially affect other parts of the program. Therefore, I offer it as a reference only.
from concurrent.futures import ThreadPoolExecutor, as_completed
def run_command(cmd):
hmmer = Popen(cmd)
hmmer.wait()
return cmd
max_workers = dbcan_thread
cmds = []
for j in split_files:
cmds.append(["hmmsearch", "--domtblout", f"{outPath}d{j}", "--cpu", "2", "-o", "/dev/null",
f"{dbDir}dbCAN_sub.hmm", f"{outPath}{j}"])
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = [executor.submit(run_command, cmd) for cmd in cmds]
for future in as_completed(futures):
try:
command = future.result()
print(f"Command: {' '.join(command)} already completed.")
except Exception as e:
print(f"An error occurred: {e}")
Best,
Robin
The text was updated successfully, but these errors were encountered:
Thank you so much for bringing this out. We previously utilized this manner due to hmmscan does not support multithreads but hmmsearch does. Therefore, we will remove the multi-processing part and just use the multi-threading butil-in function from hmmsearch.
Let me just delete and test codes and I will put 4.1.2 version. Thank you so much!
Thank you for your hard work and recent updates. However, I wanted to bring to your attention an ongoing issue with the latest version of run_dbcan-4.1.1. When processing large input sequence files, dbcan_sub tends to create an excessive number of threads, resulting in high system load. #117
This issue persists even when specifying parameters such as --dbcan_thread and --hmm_cpu, as there seems to be no effective limitation on the number of threads being created.
After reviewing the code of run_dbcan.py, I have identified that the issue lies within the function split_uniInput. This section of code directly launches as many subprocesses as the number of small files generated by splitting the large input file.https://github.com/linnabrown/run_dbcan/blob/707aed21a0ef455828126f1afb5820963e8274ca/dbcan/cli/run_dbcan.py#L139C1-L157C22
I made modifications to this specific code section to prevent excessive load when I used it myself. I implemented a simple ThreadPool, but I'm unsure if this could potentially affect other parts of the program. Therefore, I offer it as a reference only.
Best,
Robin
The text was updated successfully, but these errors were encountered: