MetaSanity - Output

3 minute read


In a previous post, we walked through several examples of using BioMetaDB - a Bio-focused command-line SQL wrapper package - to query the results of our MetaSanity pipeline runs. The contents of each project is self-contained, making it very easy for users to focus their attention on getting specific information.

Given the complexity and volume of this application, we will walk through the raw and parsed output from each of the programs that MetaSanity runs. This information is also available as a PowerPoint presentation (views best using PowerPoint Online).

MetaSanity output

├── evaluation.tsv
├── functions.tsv
├── checkm_results/
├── fastani_results/
├── gtdbtk_results/
├── prodigal_results​/
├── prokka_results​/
├── interproscan_results​/
├── kegg_results​/
├── cazymerops_results​/
├── virsorter_results/
├── mag1.annotation.tsv

The first two .tsv files contain the parsed results of the MetaSanity pipelines. These are simple tab-delimited text files. PhyloSanity results are contained in evaluation.tsv. FuncSanity results are in a set of files. functions.tsv contains functional and metabolic information for all genomes that were studies. A file ending with the extension .annotation.tsv will be generated for each genome that was analyzed.

The remaining directories contain the raw output from each of the programs that were incorporated into the user’s annotation pipeline. The resulting file(s) are parsed for specific information to generate the workflow’s results.



checkm_results/ is parsed for contamination and completion estimates.


fastani_results/fastani_results.txt contains ANI information used in redundancy determination


gtdbtk_results/GTDBTK.bac120.summary.tsv and gtdbtk_results/GTDBTK.ar122.summary.tsv optionally provide putative phylogeny.


For each MAG that is analyzed, a series or files may be generated.

Prodigal gene caller

├── mag1.mrna.fna
├── mag1.protein.faa
├── mag1.txt

Prokka (optional gene caller and annotation pipeline)

├── mag1/
  ├── mag1.tsv  # Original output
  ├── mag1.prk.tsv.amd  # Parsed output for BioMetaDB
  ├── mag1.prk.tsv.prokka.nucl  # RNA features
├── diamond/
  ├── mag1.prk-to-prd.tsv  # Protein annotations

The remaining files in each directory are the outputs from each Prokka run.


├── mag1.tsv  # Original output
├── mag1.amended.tsv  # Parsed output

InterProScan contributes the most to MetaSanity runtimes.

KoFamScan and KEGG-Decoder

├── biodata_results/  # KEGG-Decoder-derived metabolic pathway estimation counts
  ├── KEGG.decoder.tsv 
├── combined_results/
  ├── combined.ko  # kofamscan matches for KEGG-Decoder input
  ├── combined.protein  # gene calls for KEGG-Decoder input
  ├── combined.expander.tbl  # HMM search results (part of BioData pipeline)
  ├── combined.expander.tsv  # Raw counts from BioData pipeline
  ├── combined.html  # Heatmap of putative metabolic pathway completion estimates
├── kofamscan_results/
  ├── mag1.tsv  # KO matches per protein
  ├── mag1.detailed  # Detailed kofamscan output
  ├── mag1.amended.tbl  # Parsed output for BioMetaDB


├── mag1.merops.tsv  # MEROPS matches
├── mag1.pfam.tsv  # MEROPS matches by PFam id
├── merops/
  ├── mag1.merops.protein.faa  # MEROPS protein matches
  ├── hmmconvert_data/  # HMM prep
  ├── hmmsearch_results/  # HMMSearch output


├── cazy/
  ├── hmmsearch_results/  # HMMSearch output
  ├── mag1.cazy_assignments.tsv  # Raw counts of CAZy HMM matches
  ├── mag1.cazy_assignments.byprot.tsv  # Protein annotations

PSORTb and SignalP

├── psortb_results/
  ├── mag1.tbl  # Raw PSORTb output
├── signalp_results/
  ├── mag1.signalp.tbl  # Raw SignalP output

Extracellular Peptidase

├── mag1.pfam.by_prot.tsv  # PFam annotations for peptidase results


├── mag1
  ├── virsorter_results/
    ├── VIRSorter_global-phage-signal.csv  # Original output
    ├── mag1.VIRSorter_adj_out.tsv  # Parsed output for BioMetaDB

The remaining files in each virsorter_results directory are the raw outputs of the VirSorter run for each MAG.