转录组和基因组组装质量评估软件之一—BUSCO
1、使用方法:
usage: python BUSCO.py -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]
2、必须参数:
-i FASTA FILE, --in FASTA FILE
Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.
这个是输入文件,为组装好的文件,可以为基因组,转录组,注释的评估,格式为fasta格式
-o OUTPUT, --out OUTPUT
Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path
输出文件的名,不能加路径
-m MODE, --mode MODE Specify which BUSCO analysis mode to run.
There are three valid modes:
- geno or genome, for genome assemblies (DNA)
基因组组装
- tran or transcriptome, for transcriptome assemblies (DNA)
转录组组装
- prot or proteins, for annotated gene sets (protein)
注释
-l LINEAGE, --lineage LINEAGE
Specify location of the BUSCO lineage data to be used.
Visit http://busco.ezlab.org for available lineages.
比对的数据库
3、可选参数:
optional arguments:
-c N, --cpu N Specify the number (N=integer) of threads/cores to use.
CPU线程数
-e N, --evalue N E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
比对的e值
-f, --force Force rewriting of existing files. Must be used when output files with the provided name already exist.
覆盖以前生成的文件
-r, --restart Restart an uncompleted run. Not available for the protein mode
重新运行未完成的任务
-sp SPECIES, --species SPECIES
Name of existing Augustus species gene finding parameters. See Augustus documentation for available options.
--augustus_parameters AUGUSTUS_PARAMETERS
Additional parameters for the fine-tuning of Augustus run. For the species, do not use this option.
Use single quotes as follow: '--param1=1 --param2=2', see Augustus documentation for available options.
-t PATH, --tmp PATH Where to store temporary files (Default: ./tmp)
--limit REGION_LIMIT How many candidate regions to consider (default: 3)
--long Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms
-q, --quiet Disable the info logs, displays only errors
只输出error信息
-z, --tarzip Tarzip the output folders likely to contain thousands of files
压缩输出文件夹
-v, --version Show this version and exit
-h, --help Show this help message and exit
4、例子:
/USER/xwf/software/busco/BUSCO.py -i ../bridger_out_dir/Bridger.fasta -o L -l /USER/xwf/database/eukaryota_odb9 -m tran -c 30 -f -e 1e-10
5、生成的文件包括run_L(因为上面的例子中,设置了输出前缀为L) 和 tmp,主要看的是run_L里面的short_summary_L.txt,其中
S:Single copy D:Duplicated F:Fragmented M:Missing
结果中要S+D的值不能太低,因为BUSCO才用的数据库是同源物种的保守蛋白,所以组装出来的结果要有一定数量的同源物种保守蛋白才为最好