Biotoolomics

From Omics.org

Jump to: navigation, search

Biotoolome comprises many types of inference methods which were designed to detect and understand biological phenomena in Omics subfields.
The standalone programs for biological inferences can be categorized according to the Omics categories.
 


1) Population  study

ANCESTRYMAP version 1.0 is now available.

ANCESTRYMAP screens through the genome in a recently mixed population such as African Americans, searching for segments with increased ancestry from one of the ancestral populations, which can indicate the position of disease genes (Patterson et al 2004). ANCESTRYMAP (like the similar Markov Chain Monte Carlo admixture mapping programs ADMIXMAP and MALDSOFT) requires genotyping data from individuals of recently mixed ancestry. To assist in running the program, we have provided a tutorial (online / PDF) and detailed documentation (online / PDF). Source code and executables for running ANCESTRYMAP can be downloaded for either the UNIX or Linux operating systems. For questions write to Arti Tandon (atandon at broad.mit.edu) or Nick Patterson (nickp at broad.mit.edu).


EIGENSTRAT version 1.0 is now available.  

EIGENSTRAT (described in Price et al. 2006) detects and corrects for population stratification in genome-wide association studies.  The method, based on principal components analysis, explicitly models ancestry differences between cases and controls along continuous axes of variation. The resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The approach is powerful as well as fast, and can easily be applied to disease studies with hundreds of thousands of markers. Source code, documentation and executables for running the EIGENSTRAT method on a Linux platform can be downloaded here. For any questions about this software write to Alkes Price (see Price et al. 2006 for contact info). EIGENSTRAT users are advised to use our newer EIGENSOFT package (see below) which subsumes all EIGENSTRAT functionality -- see documentation included in the EIGENSOFT package for details.

 


EIGENSOFT version 1.0 is now available.  

Our new EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification correction method (Price et al. 2006). This package has a built-in plotting script and supports multiple file formats and quantitative phenotypes. Source code, documentation and executables for using the EIGENSOFT package on a Linux platform can be downloaded here.


ADMIXMAP

  A program to model admixture using marker genotype data

HAPMIXMAP  

  A program to model HapMap haplotypes in genetic association studies using tag SNP genotypes

POOLSCORE

  A program to model case-control studies using DNA pools

IM and IMa

IM is a program, written with Rasmus Nielsen, for the fitting of an isolation model with migration to haplotype data drawn from two closely related species or populations.  IM is based on a method originally developed by Rasmus Nielsen and John Wakeley (Nielsen and Wakeley 2001 Genetics 158:885).  Large numbers of loci can be studied simultaneously, and different mutation models can be used. 

IMa implements the same Isolation with Migration model, but does so using a new method that provides estimates of the joint posterior probability density of the model parameters. IMa also allows log likelihood ratio tests of nested demographic models.  IMa is based on a method described in Hey and Nielsen (2007 PNAS 104:2785–2790).   IMa is faster and better than IM (i.e. by virtue of providing access to the joint posterior density function), and it can be used for most (but not all) of the situations and options that IM can be used for.

Structure 2.2

The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPs, microsatellites, RFLPs and AFLPs. The basic algorithm was described by Pritchard, Stephens & Donnelly (2000). Extensions to the method were published by Falush, Stephens and Pritchard (2003) and (2007).

STRAT

The program STRAT is a companion program to structure. This is a structured association method, for use in association mapping, enabling valid case-control studies even in the presence of population structure. This method was described in an article in Am. J. Hum Genet 2000 (67:170-181).  Collaborators:  Matthew Stephens, Noah Rosenberg, Peter Donnelly.

SelSimStanding

A program to simulation population genetic data from a model of a selective sweep on standing variation (Przeworski, Coop and Wall Evolution 2005). The program will be available shortly. For further information please contact Graham Coop ( gcoop at bsd.uchicago.edu )

 

 

Arlequin

Arlequin is an exploratory population genetics software environment able to handle large samples of molecular data (RFLPs, DNA sequences, microsatellites), while retaining the capacity of analyzing conventional genetic data (standard multi-locus data or mere allele frequency data). A variety of population genetics methods have been implemented either at the intra-population or at the inter-population level, and they can be conveniently selected and parameterized through a graphical interface. Arlequin has no equivalent in his field and will be extremely useful to analyzed the large data sets which are now available by the use of the latest molecular engineering techniques.

Sequence analyses

COALESCE: estimates the effective population size of a single
constant population using nonrecombining sequences.

DnaSP: estimates several measures of DNA sequence
variation within and between populations, linkage disequilibrium,
recombination, gene flow and gene conversion.

FLUCTUATE: estimates the effective population size and an
exponential growth rate of a single growing population using
nonrecombining sequences.

GENETREE: allows ancestral inference from DNA sequences
from single or subdivided populations which conform to the
infinite sites model.

ProSeq: PROcessor of SEQuences- aligns, edits, and
analyzes sequence data.

SITES: analysis of comparative DNA sequence data..

Allelic analyses

Arlequin: population genetics software environment able
to handle large samples of molecular data (RFLPs, DNA sequences,
microsatellites), while retaining the capacity of analyzing
conventional genetic data (standard multi-locus data or mere allele
frequency data).

BOTTLENECK: a program for detecting recent effective population
size reductions from allele data frequencies.

FSTAT: calculates Weir & Cockerham's estimators of F-
statistics.

GDA: Software for the Analysis of Discrete Genetic Data.
Presents basic population genetic parameters, F-statistics, and
tests for disequilibrium.

GeneClass: identifies whether specific populations can be source
populations for an observed founder population.

GenePop on the web: population genetic software, also by FTP.
Genepop computes tests for Hardy-Weinberg equilibrium, differentiation
and disequilibrium as well as classical population parameters.

MIGRATE: estimates the effective population sizes and migration
rates of two constant populations using nonrecombining
sequences, microsatellite data or enzyme electrophoretic
data.

POPGENE: analysis of genetic variation among and within populations
using co-dominant and dominant markers.

TFPGA (Tools for population genetic analysis): a Windows
program for the analysis of allozyme and molecular data. Among its
features include an intuitive word processor-like interface, the
ability to analyze both codominant and dominant markers, and support
for hierarchical data sets.

Quantitative genetics

QTL Cartographer.
Quantitative genetics packages compiled by B. Walsh.

Phylogenetics

COMPARE.
DAMBE: integrated software package for comparative molecular data.
HY-PHY: hypothesis testing using phylogenies.
PUZZLE: reconstruct trees from sequence by maximum likelihood.
PHYLIP.
TrExML: maximum likelihood analysis of nucleotide sequences
Big list of phylogenetics programs.


PopBio: population biology simulation package written for
the Macintosh; contains components for population growth,
population genetics, predator/prey interactions, and inter-
specific competition.

Population genetic applets that can be run from the web,
including genetic drift, Wahlund effect, etc.

Populus: simulation software for population biology and
evolutionary ecology.

 


Contents

2) Association study

BIMBAM: software for Bayesian IMputation-Based Association Mapping.

The program BIMBAM implements methods for assocation mapping, based on those described in

Servin, B and Stephens, M (2007). Imputation-based analysis of association studies: candidate genes and quantitative traits. PLoS Genetics, to appear.

BIMBAM can handle both large association studies (e.g. Genome scans) and smaller studies of candidate genes/regions.

The software is distributed under the Gnu Public License (GPL). To register and download Click Here

Instructions are available Here

fastPHASE: software for haplotype reconstruction, and estimating missing genotypes from population data

The program fastPHASE implements methods described in

Scheet, P and Stephens, M (2006). A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet (to appear)

fastPHASE can handle larger data-sets than PHASE (eg hundreds of thousands of markers in thousands of individuals), but does not provide estimates of recombination rates. Our experiments suggest that haplotype estimates are slightly less accurate than from PHASE, but missing genotype estimates appear to be similar or even slightly better than PHASE.

The software is free for non-commercial use, and may be licensed for commercial use. To view the terms and conditions, and then proceed to download, click here.

PHASE: software for haplotype reconstruction, and recombination rate estimation from population data

The program PHASE implements methods for estimating haplotypes from population genotype data described in

Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics, 73:1162-1169.

Stephens, M., Smith, N., and Donnelly, P. (2001). A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978--989.

Stephens, M., and Scheet, P. (2005). Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation. American Journal of Human Genetics, 76:449-462.

The software also incorporates methods for estimating recombination rates, and identifying recombination hotspots:

Crawford et al (2004). Evidence for substantial fine-scale variation in recombination rates across the human genome. Nature Genetics, to appear.

The software is free for non-commercial use, and may be licensed for commercial use. To view the terms and conditions, and then proceed to download, click here.

Instructions for PHASE are included on the download site, or are also available here.

SCAT: Smoothed and Continuous AssignmenTs

The program SCAT (Smoothed and Continuous AssignmenTs) implements a Bayesian statistical method for estimating allele frequencies and assigning samples of unknown (or known) origin across a continuous range of locations, based on genotypes collected at distinct sampling locations. In brief, the idea is to assume that allele frequencies vary smoothly in the study region, so allele frequencies are estimated at any given location using observed genotypes at near-by sampling locations, with data at the nearest sampling locations being given greatest weight. Details are given in

Wasser, S., et al (2004). PNAS, 41, 14844-14852.

SCAT is available here.

HOTSPOTTER: software for identifying recombination hotspots from population SNP data

This software by Na Li implements methods from

Li, N., and Stephens, M. (2003). Modelling Linkage Disequilibrium, and identifying recombination hotspots using SNP data Genetics, <it></it>To appear.

It is available free from here.




Following parts contain not arranged contents:

BATWING : Bayesian Analysis of Trees With Internal Node Generation. Batwing is described in Wilson, Weale & Balding 2003.Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities. Journal of the Royal Statistical Society: Series A, 166: 155-188.


From http://www.rannala.org/labpages/software.html

DMLE+: Multipoint Linkage Disequilibrium Mapping. All versions of the DMLE/DMLE+ program are now distributed from dmle.org.

GeneArtisan: Simulation of Genetic Markers in Case-Control Study Designs.
Version 1.1 Release Date 22 May 2005. Note: This release implements an improved algorithm for simulating samples that allows larger intervals to be used and dramatically improves execution time. Check this page for updates and bug reports.

|Linux GUI (v1.0)| Mac OS X GUI| Linux Terminal| Mac OS X Terminal| Windows Terminal| Source Code (Terminal)| Source Code (GUI)|

Reference: Y.Wang and B. Rannala 2003. In Silico Analysis of Disease-Association Mapping Strategies Using the Coalescent Process and Incorporating Ascertainment and Selection. American Journal of Human Genetics 76 (In Press).

Program Support: Questions and answers about general issues/problems using this program should be posted to the Genetics Software Forum (http://rannala.org/gsf). Bugs causing program crashes etc should be reported to ygwang@ucdavis.edu.

BayesAss+: Bayesian Estimation of Recent Migration Rates Using Multilocus Genotypes.
Version 1.3 Release Date 4 May 2005. Note: New feature was added to output the 95% credible interval for the migration rate estimates. This is helpful in determining when the marker data are informative about migration by comparing the resulting posterior credible intervals with those expected under the prior (e.g., when the data are completely uninformative). Only the Windows release of version 1.3 is currently available. Releases for other operating systems will be posted soon.

|Linux (v.1.2)| Mac OS X (v.1.2)| Windows (v.1.3)| Source Code (v.1.2)| Documentation (v.1.3)| Input File|

Reference: G.A. Wilson and B. Rannala 2003. Bayesian inference of recent migration rates using multilocus genotypes. Genetics 163: 1177-1191. Reprint(PDF format for Adobe Acrobat)

Program Support: Questions and answers about general issues/problems using this program should be posted to the Genetics Software Forum (http://rannala.org/gsf). Bugs causing program crashes etc should be reported to gregwils@uclink.berkeley.edu.

BayesAssNM: Bayesian Estimation of Recent Migration Rates Using Multilocus Genotypes When Migrants Are Not Included In the Sample.
Version 1.0 Release Date 21 January 2005 Note: This version of BayesAss is only to be used when genotypes are obtained from samples that are known not to be migrants, such as embryos.

|Linux| Source Code| Documentation|

 

Program Support: Questions and answers about general issues/problems using this program should be posted to the Genetics Software Forum (http://rannala.org/gsf). Bugs causing program crashes etc should be reported to gregwils@uclink.berkeley.edu.

Reference: Jehle R, GA Wilson, JW Arntzen, and T Burke. 2005. Contemporary gene flow and the spatio-temporal genetic structure of subdivided newt populations (Triturus cristatus, T. marmoratus). Journal of Evolutionary Biology (In Press).

BayesAss- : Bayesian Population Assignment of Haploid Organisms Using Multilocus Genotypes. Version 1.0 Release Date 18 July 2002 (Beta).

|Windows 95/98/NT|Mac Classic OS|Mac OS X| Input File|

Reference: M.C. Fisher, B. Rannala, V. Chaturvedi and J.W. Taylor. 2002. Disease surveillance in recombining pathogens: Multilocus genotypes identify sources of human Coccidioides infections. Proceedings of the National Academy of Sciences USA 99:9067-9071. Reprint(PDF format for Adobe Acrobat)

oncSpectrum : Likelihood Analysis of the Spectrum of Somatic Mutations in Cancers. Version 1.0 Release Date 2 Mar 2004.

oncSpectrum_v1_tar.gz (Windows executable, source code, example input files, and documentation)

Reference: Z. Yang, S. Ro and B. Rannala. 2003. Likelihood models of somatic mutation and codon substitution in cancer genes. Genetics 165: 695-705. Reprint(PDF format for Adobe Acrobat )

BACTFREQ: Maximum Likelihood Estimation of Bacterial Allele Frequencies (replaces BFREQ program). Program by Eric Anderson (Univ. Washington), Release Date 10 June 2001.

|Windows 95/98/NT| Macintosh|Source Code in C|

Users of this program are requested to cite both the papers listed below. Questions about the program should be addressed to eriq@u.washington.edu.

Reference: E. C. Anderson and P. A. Scheet. 2001. Improving the estimation of bacterial allele frequencies. Genetics (July, to appear).

Reference: B. Rannala, W. Qui, and D. E. Dykhuizen. 2000. Methods for estimating gene frequencies and detecting selection in bacterial populations. Genetics 155: 499-508.
Reprint(PDF format for Adobe Acrobat)

BDMC21: Maximum Likelihood Estimation of Allele Ages. Version 2.1. Release Date 7 Oct 1998.

|Windows 95/98/NT| Linux i386|Documentation| Input File|

Reference: Slatkin, M., and B. Rannala. 1997. Estimating the age of alleles by use of intraallelic variability. American Journal of Human Genetics 60: 447-458.
(Read Abstract On-line)

Immanc: Detecting Immigrants Using Multilocus Genotypes. Version 5.0. Release Date 8 Oct 1998.

|Windows 95/98/NT| Linux i386 | Mac OS X | Mac Classic |Documentation| Input File|

Reference: Rannala, B., and J.L., Mountain. 1997. Detecting immigration by using multilocus genotypes. Proceedings of the National Academy of Sciences USA 94: 9197-9201.
Reprint(PDF format for Adobe Acrobat)

Legacy Computer Programs (pre-1997)
PMLE: Maximum Likelihood Estimation of Gene Flow Version 2.0. Release Date 8 Oct 1998.

|Windows 95/98/NT| Linux i386|Documentation |Input File|

Reference: Rannala, B., and J. A. Hartigan. 1996. Estimating gene flow in island populations. Genetical Research 67:147-158.
(Read Abstract On-line)

 


Personal tools
Google AdSense