%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~% % This is the software for the Bayesian haplotype analysis method developed % % by Liu, J.S., Sabatti, C., Teng, J., Keats, B.J.B. and N. Risch in article % % "Bayesian Analysis of Haplogypes for Linkage Disequilibrium Mapping." % % Please cite the article if you wish to use the program. % ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ==================== Version history: Version 1.0 by ***** * Initial program package, available from http://www.fas.harvard.edu/~junliu/TechRept/01folder/diseq_prog.tar.gz Version 1.1 by Xin Lu, Mar 21, 2002: * User interface revised, add help information, set some parameters through command line. * Support have/don't have missing data. * Try several random starts for the same dataset, (default 10 times), calculate the mean, standard deviation and 95%PI of the position estimation, and the mean of log likelihood. Choose the one with the biggest average log likelihood as the final result. * Available from http://www.fas.harvard.edu/~junliu/TechRept/01folder/blade.tar.gz ==================== ==================== ENVIRONMENT FOR THIS PROGRAM ==================== This binary executable program can run under linux on PC, if you need executable program for other systems, or need the source files to build your own executable, please contact Jun S. Liu ( jliu@stat.harvard.edu). ==================== COMMANDS TO EXECUTE ==================== You can specify some parameters to run this program. Use the following commands to get help information on command parameters: >blade -h BLADE, Bayesian LinkAge DisEquilibrium mapping reference: Jun S. Liu, et. al., Genome Research, 11:1716-1724, 2001 version 1.1 by Xin Lu, Mar 21, 2002 Usage: blade [OPTION PARAMETER]... Options: -h: For help on usage. -b: Number of burn-ins, recommend to be 20% of samples, default 1000 -c: Number of clusters, in addition to the 0 cluster corresponding to absence of founder effect. Default: 2 -e: Specifies a parameter for the Metropolis-Hastings step used for updating the disease location. It corresponds to 2s in the manuscript. 0.075 is a recomended value. -g: Whether want to enter a guess of ancestral haplotype(s) for cluster(s) T or t, for input a guess, then the user will be promoted to input them F or f, for not input, the program will generate random one(s) default: T -k: Set the retry times, default 10 The final result is the one with biggest average log likelihood -m: Model for control population: 0 equilibrium, 1 markov default 1 -n: Set whether exist missing data in datafiles, default: T The N alleles are represented by 0 ~ N-1, and the missing data is denoted by N. -p: The path where the data stores, default case1/ -r: The seed of random number generater, default 0 The program will set a random seed if without a user input -s: Total number of Monte Carlo samples, default 4000 -u: Maximal mutation rate to be used (minimal is MRATE/10), default 0.000200 -v: Verbose mode T or t for verbose mode, print iteration step, disease position and log likelihood for every 20 steps F or f for silence mode, only final results will be printed default: T ======================= INPUT AND OUTPUT FILES ======================= ----- input ----- the program takes the following files as input 1) disease contains the diseased haplotypes. Each haplotype occupies a row and it is separated from the others by a return code. Each marker is coded with allele values 0,1,2, .., N-1, where N is the maximun number of alleles in all the markers in the dataset. Defaultly the program will consider missing data denoted by N, but this can be disabled by set the parameter -i f. Each marker is separated from others by blank space or tabs. 2) control cotains control haplotypes, coded in the same fashion as diseased 3) unphased if there are unphased unphased diseased haplotypes they are entered here (two adjacent rows will contain the genotype of an individual) 4) lambda distances between markers in Morgans (one line, separated by blank space) They should be stored in a seperate directory, and specified in command line by -p option. If the user do not input this parameter, the program will try to seek input files in case1 as default input path. ------ output ------ The outputed result files include: * result: sample draws of the disease location (recorded after the initial burnout period, like all the sample draws in any output file) * ancestor1, ancestor2, ...: sample draws of ancestor haplotype for cluster1, cluster2 and so on. * cluster: posterior probabilities for each disease haplotype to be in each cluster. * cluster1, cluster2, ...: disease population ordered according to their posterior probability in cluster1, cluster2, and so on. * maxlik: likehood for each sample draw * age1, age2, ...: sample draws for ages of the mutations. * mrate: sample draws of mutation rate. * rec1: sample draws of the interval where R_1 is located * rec2: sample draws of the interval where R_1 is located The program will run on the same dataset several times, (default 10, you can specify this by using the command parameter -k). Result files for each retry will be saved on seperate directories, named by try1, try2, ..., under the source data directory. The average maxloglik for the steps after burn-in will be calculated, along with the mean, dev and 95%PI of the position of each run will be calculated and printed. And the one with biggest average maxloglik can be taken as the best result. For other results, you can take them as a reference, or just delete them. ====================== Questions and Answers ====================== A. What are "MODEL" and "METRO"? Ans: There are two choices for MODEL, 0 or 1. For MODEL 0, we assume control population is in equilibrium, while for MODEL 1, we assume a first-order Markov model for the control population. METRO specifies a parameter for the Metropolis-Hastings algorithm we used for updating the disease location. It corresponds to 2s in our paper.