Categorygithub.com/kussell-lab/assemblyalignmentgenerator
modulepackage
0.0.0-20180209230228-253676b4e065
Repository: https://github.com/kussell-lab/assemblyalignmentgenerator.git
Documentation: pkg.go.dev

# README

AssemblyAlignmentGenerator

This program generates core-gene alignments from a list of assemblies. It downloads the genomic sequences from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/ and re-annotates them using Prokka. It then uses Roary to generate the pan-genome, and extracts the core genome, which are a set of genes that appear in all the assemblies. The protein sequences of each core gene are aligned by MUSCLE, and then back-translated to DNA sequences.

Installation

The program was written in Bash, Go and Python. It requires following programs:

and Python libaries:

  • pip install --user tqdm biopython

and Go libaries:

  • go get -u github.com/cheggaaa/pb
  • go get -u github.com/mattn/go-sqlite3
  • go get -u gopkg.in/alecthomas/kingpin.v2
  • go get -u github.com/kussell-lab/biogo/seq

A docker file is also provided for building a docker image (see https://docs.docker.com/ for how to use docker). The docker file also shows how to install this program in Ubuntu 17.10.

Usage

AssemblyAlignmentGenerate <assembly summary file> <accession list file> <output directory> <output prefix>

The output is a XMFA file containing the final alignments of DNA sequences of the core genes. The file can be found in <output directory>/<output prefix>_core.xmfa.

# Functions

BackTranslate back translates amino acid alignment to nucleotide sequences.
MultiAlign aligns sequence alignment of protein sequences and back translate them to nucleotide sequences.

# Structs

SeqRecord contains a record for a sequence.
SeqSet contains sets of sequences.

# Type aliases

MultiAlignFunc is a interface for multiple alignment function.