转录组和基因组组装质量评估软件之一—BUSCO

2025-10-09 00:45:11

1、使用方法:

usage: python BUSCO.py -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]

2、必须参数:

-i FASTA FILE, --in FASTA FILE

                        Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.

这个是输入文件,为组装好的文件,可以为基因组,转录组,注释的评估,格式为fasta格式

  -o OUTPUT, --out OUTPUT

                        Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path

输出文件的名,不能加路径

  -m MODE, --mode MODE  Specify which BUSCO analysis mode to run.

                        There are three valid modes:

                        - geno or genome, for genome assemblies (DNA)

                                          基因组组装

                        - tran or transcriptome, for transcriptome assemblies (DNA)

                                          转录组组装

                        - prot or proteins, for annotated gene sets (protein)

                                          注释

  -l LINEAGE, --lineage LINEAGE

                        Specify location of the BUSCO lineage data to be used.

                        Visit http://busco.ezlab.org for available lineages.

                                          比对的数据库

3、可选参数:

optional arguments:

  -c N, --cpu N         Specify the number (N=integer) of threads/cores to use.

CPU线程数

  -e N, --evalue N      E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)

比对的e值

  -f, --force           Force rewriting of existing files. Must be used when output files with the provided name already exist.

覆盖以前生成的文件

 -r, --restart         Restart an uncompleted run. Not available for the protein mode

重新运行未完成的任务

  -sp SPECIES, --species SPECIES

                        Name of existing Augustus species gene finding parameters. See Augustus documentation for available options.

  --augustus_parameters AUGUSTUS_PARAMETERS

                        Additional parameters for the fine-tuning of Augustus run. For the species, do not use this option.

                        Use single quotes as follow: '--param1=1 --param2=2', see Augustus documentation for available options.

  -t PATH, --tmp PATH   Where to store temporary files (Default: ./tmp)

  --limit REGION_LIMIT  How many candidate regions to consider (default: 3)

  --long                Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms

  -q, --quiet           Disable the info logs, displays only errors

只输出error信息

  -z, --tarzip          Tarzip the output folders likely to contain thousands of files

压缩输出文件夹

  -v, --version         Show this version and exit

  -h, --help            Show this help message and exit

4、例子:

/USER/xwf/software/busco/BUSCO.py -i ../bridger_out_dir/Bridger.fasta -o L -l /USER/xwf/database/eukaryota_odb9 -m tran -c 30 -f -e 1e-10

5、生成的文件包括run_L(因为上面的例子中,设置了输出前缀为L) 和 tmp,主要看的是run_L里面的short_summary_L.txt,其中

S:Single copy D:Duplicated F:Fragmented M:Missing

结果中要S+D的值不能太低,因为BUSCO才用的数据库是同源物种的保守蛋白,所以组装出来的结果要有一定数量的同源物种保守蛋白才为最好

转录组和基因组组装质量评估软件之一—BUSCO

声明:本网站引用、摘录或转载内容仅供网站访问者交流或参考,不代表本站立场,如存在版权或非法内容,请联系站长删除,联系邮箱:site.kefu@qq.com。
猜你喜欢