finding %completeness of CGC in the microbial genome of interest <Theory question and suggestions> #161

Jigyasa3 · 2024-02-06T17:55:46Z

Thank you again for a very important tool to annotate CAZymes and identify CGCs in the microbial genomes of interest.
I am interested in examining how complete are the CGCs in my microbial genome of interest.
For example, if dbcan3 identifies 5 CGCs in my microbial genome of interest. To understand the %completeness of these CGCs, I extract out nucleotide sequences spanning the start and end coordinates of the CGCs and PULs from dbcan-PUL database. Then I do a BLASTn search of the 5 CGC sequences against the complete dbcan-PUL database to get %similarity and %coverage.

Is that a correct approach?
My goal is to bioinformatically say that we found 5 CGCs in the microbial genome, which are XYZ % similar to known PULs and have ABC % of completeness so we can speculate that these CGCs would be functional. But if the similarity and coverage are less than ~40% (arbitrary cutoff) then it's either a novel CGC or a non-functional CGC.

Looking forward to your suggestions and reply!
Regards,
Jigyasa

yinlabniu · 2024-02-06T19:01:24Z

The short answer is yes. We used a similar strategy in dbCAN3 when predicting substrates for CGCs by blast search against dbCAN-PULs, while our parsing thresholds are more relaxed (min identity 20% and min 2 CAZyme matches to call a CGC-PUL pair). However, I should mention that the boundary of CGCs (which affects the length of CGCs) is never rigorously evaluated. PUL boundaries are often experimentally determined (e.g., through rna-seq differential expression), but CGC boundaries are arbitrarily determined based on our CGC prediction criteria (default: at least one CAZyme and one transporter, and the number of inserted non-signature genes are less than 2; this can be customized by users). Therefore, in many cases, the %coverage or completeness cutoff you mentioned is difficult to determine.

linnabrown · 2024-04-19T00:51:25Z

Do you still have questions? @Jigyasa3 If not, please close the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finding %completeness of CGC in the microbial genome of interest <Theory question and suggestions> #161

finding %completeness of CGC in the microbial genome of interest <Theory question and suggestions> #161

Jigyasa3 commented Feb 6, 2024

yinlabniu commented Feb 6, 2024

linnabrown commented Apr 19, 2024

finding %completeness of CGC in the microbial genome of interest <Theory question and suggestions> #161

finding %completeness of CGC in the microbial genome of interest <Theory question and suggestions> #161

Comments

Jigyasa3 commented Feb 6, 2024

yinlabniu commented Feb 6, 2024

linnabrown commented Apr 19, 2024