A suggested setting of the environment is a Linux server that has Apache 2.0 or later version, MySQL 5.0 to 5.2 , 5.3 or later version and PHP 5.0 or later version with mysqli module. Moreover, the environment need to fulfill the requirements to run BLAT and Blastn, e.g., the system needs to have libidn to support Blastn.
File | Format | Number |
---|---|---|
Reference genome | FASTA file | 1 per organism |
CDS | FASTA file | 1 per organism |
Genome Annotation | GTF or other format | 1 per organism |
Test statistics | Tab separated format is preferred | 1 or more per organism |
Users can generate the annotation mysql infile using Perl scripts listed below in the folder ‘install/scripts’,
File | Format |
---|---|
GFF3 | swav_annotation_format_change_gff3.pl |
GTF | swav_annotation_format_change_gtf.pl |
swav_annotation_format_change_gff3.pl INFILE OUTFILE
or write scripts to change the source annotation file into the format as:
id | chr | type | start | end | strand | name | annotation |
---|
Note: ‘id’ starts from 1 and can be increased by 1 each time. 'type' needs to include 'RNA','exon','CDS' and 'intron'. 'chr', 'start', 'end', 'strand', 'name', 'annotation' denote chromosome/scaffold name, start postion, stop position, name the transcript/gene, functional annotation of the gene/transcript.
In the fasta format file of the reference genome, the title of each sequence needs to be clean.
For example, “>Chr1” is correct but “>Chr1 length=12341412” is incorrect.
The names of the chromosomes/scaffolds need a one-to-one map to ‘chr’ in the annotation mysql infile.
We offered a script to process chr file swav_chr_formatting.pl
swav_chr_formatting.pl ORIGINAL_GENOME_FASTA_FILE OUT_GENOME_FASTA_FILE
In the fasta format file of the CDS, the title of each sequence needs to be clean.
For example, “>YBR024W” is correct but “>YBR024W_mRNA cds chromosome:R64-1-1:II:289445:290350:1” is incorrect.
The gene name needs a one-to-one map to ‘name’ in the annotation mysql infile.
We offered a script to process cds file swav_cds_formatting.pl
swav_cds_formatting.pl ORIGINAL_CDS_FASTA_FILE OUT_CDS_FASTA_FILE
Users can generate the chromosome length mysql infile using the Perl script swav_chrlen.pl.
swav_chrlen.pl -i INFILE -o OUTFILE
The mysql infile of each test statistic need to follow the format:
id | chr | pos | val |
---|
Here, ‘id’ starts from 1 and is increased by 1 each time. ‘chr’ is chromosome. ‘pos’ is the center of the test window. ‘val’ is the value of test statistics. Users need to transform each test statistic into the mysql infile format.
Unpack the source zip file swav.zip. The structure of the folder is
web | the web program of SWAV. Note: the hypertext access file .htaccess is at the root. You need to check if "AllowOverride All" is in the configure file of Apache. |
---|---|
install | scripts and sample files to install SWAV |
swav.sql | the sql file to initialize SWAV database |
Place the content of “web” into the html folder of Apache or the appropriate position of the web service.
mv web /var/www/html
Create a database named ‘swav’ in MySQL, execute ‘swav.sql’ to create basic tables and grant privileges of ‘swav’ to a account to master the database.
Change the config file ‘application/config/ database.php’ in the folder ‘web’. Specifically, find the following lines, and then set the content of each item as your setting in the mysql database for SWAV.
'hostname' => 'localhost or a IP address', 'username' => 'username to visit the MySQL database of SWAV', 'password' => 'password to visit the MySQL database of SWAV', 'database' => 'the name of newly generated SWAV database'
Users need to set the base URL at the file 'application/config/config.php' in the folder ‘web’. Find the following line as listed below and set it.
setting $config['base_url'] = 'The web address of your SWAV';
Visit your SWAV site, e.g. myswav.org. Go to ‘setting’ (myswav.org/setting). The initial username and password is ‘root’ and ‘swav’.
Go to “Genome->Organism” and add an organism, e.g., saccharomyces cerevisiae. The sign of an organism need to be alphabetic, in lower-case without blank spaces, e.g., ‘scerevisiae’ can be the sign of saccharomyces cerevisiae.
SWAV offers scripts (in the folder 'install/scripts') to faciliate users to upload data. Before running the scripts, users need set the database connection config file 'conf'.
$mysql_database="THE DATABASE NAME"; $mysql_user="THE USER WITH PRIVILEGES OF THE DATABASE"; $mysql_passwd="THE PASSWORD OF THE USER";
The user of the database needs to have the privilege 'FILE'. If the user is not 'root', users can use the following commands in the mysql ROOT administration to add 'FILE' privilege.
REVOKE ALL PRIVILEGES ON * . * FROM 'THE UESR NAME'@'localhost'; REVOKE GRANT OPTION ON * . * FROM 'THE UESR NAME'@'localhost'; GRANT FILE ON * . * TO 'THE UESR NAME'@'localhost' WITH MAX_QUERIES_PER_HOUR 0 MAX_CONNECTIONS_PER_HOUR 0 MAX_UPDATES_PER_HOUR 0 MAX_USER_CONNECTIONS 0 ;
Upload the annotation mysql infile using the script ‘upload_annotation.pl’
upload_annotation.pl -c DATABASE_CONFIG_FILE -i ANNOTATION_INFILE -s SIGN_OF_A_ORGANISM
Upload the annotation chromosome length infile using the script ‘upload_chr_len.pl’
upload_chr_len.pl -c DATABASE_CONFIG_FILE -i CHROMOSOME_LENGTH_INFILE -s SIGN_OF_A_ORGANISM
Users can view the status of newly generated tables in “Genome->Gene model” and “Genome->Chromosome Info”.
To enable the search function, users need to move the fasta format genome file to the position 'res/genomes' in the web folder and rename the file in the format of [sign of the organism].fa
mv genome.fa web/res/genomes/[sign of the organism].fa
To move the fasta format CDS file to the position 'res/cds' in the web folder, rename the file in the format of [sign of the organism].fa, and make blast use database.
mv cds.fa web/res/genomes/[sign of the organism].fa ../programs/makeblastdb -dbtype nucl -in [sign of the organism].fa -out [sign of the organism].fa
Users can add a track in “Test statistics->Track list”. A single track indicates only one test statistic, while composite track indicates more than one test statistics.
If users newly set a composite track, they need to add subtracks in “Test statistics->Subtrack”.
The sign of the track needs to be alphabetic in lower-case and without blank spaces. The full name will be displayed in the main panel of SWAV. Track color is the color of the curves and points in the drawing of the test statistics. Track height is the height of the track image. If more than one track exists, ‘Rank’ will adjust their order from high to low.
Users need to upload track data to the tables listed in “Test statistics->Data”. The script ‘upload_indicator.pl’ in the folder ‘prepare’ functions to perform the uploading.
upload_tracks.pl -c DATABASE_CONFIG_FILE -i TRACK_FILE -t TABLE_NAME -p PARAMETER_STRING The format of PARAMETER_STRING is [the first line to get information]:[column of chromosome]:[column of position]:[column of track value] e.g., upload_tracks.pl -c conf -i theta.dom -t indicator_scerevisiae_theta_dom -p 2:1:2:3
In “Test statistics->Cutoffs”, users can set lines of cutoffs, which will be displayed in the main page.
the script get_cutoff.pl can help to obtain a designated cutoff for ranking test.
get_cutoff.pl -i TRACK_FILE -c CUTOFF_PERCENTAGE -p PARAMETER_STRING The format of PARAMETER_STRING is [the first line to obtain information]:[column of track value] The cutoff percentage is the percentage of track values in descending order e.g., get_cutoff.pl -i theta.dom -c 0.05 -p 2:3
“Test statistics->Chg Password” is the section to change the password of user ‘root’.
On the board, clicking on one gene model or one point of some test statistics, its related information is displayed.
Each test statistic may have one track (single track) or more multiple tracks (multi-track). For multitrack, track names are numbered and listed following the test statistic name.
Users can set one or more cutoffs and display the cutoff line in the main pane.
Button/Inputs | Function |
---|---|
Choose an organism | |
Set the position to view | |
Direct to the view of selected organism and inputted position | |
Move the genome position to the left | |
Move the genome position to the left quickly | |
Move the genome position to the right quickly | |
Move the genome position to the right | |
Zoomin the view | |
Zoomout the view | |
This button enables a user to set the focus bar. In the pop-up dialog, users can choose to draw a focus bar range from position X to position Y or not to draw a focus bar range. Users can set the color, position and width of the bar. | |
This button enables a user to export the drawing of one track. In its pop-up dialog, users can choose to include or exclude gene annotation, cutoff line or fitting curves, and to set the size of the generated image. The size of the image need to be larger than 1000*600. After clicking “Preview & Download”, a new page to preview the drawing is generated. Clicking one "Download" will start to download the image. | |
Export one test statistic in the view in csv format |
BLAT/Blast | Users can BLAT a sequence or a list of sequences in fasta format to the genome of selected organism and view tracks in the region mapped to one query sequence. The query sequence will be in the color orange in the browser. Users can choose to include the query sequence in the generated image. |
By Positions | After inputting a list of genome sites, users can obtain a list of links to view the tracks at inputted sites. |
By Gene names | Users input a list of gene/transcript names and obtain a list of links to view the tracks of the list of genes |
Files in the folder 'install/samples/' help users to build their SWAV.
File | Description |
---|---|
scerevisiae.gff3 | The GFF3 format file of Saccharomyces cerevisiae from Ensembl |
scerevisiae.gtf | The GTF format file of Saccharomyces cerevisiae from Ensembl |
scerevisiae_genome.fa | The genome of Saccharomyces cerevisiae from Ensembl |
theta.dom | This file is the result of Thetas, Tajima's D and Neutrality tests of domesticated yeast genomes (from PRJEB1973) by ANGSD |
theta.wild | This file is the result of Thetas, Tajima's D and Neutrality tests of wild yeast genomes (from PRJEB1973) by ANGSD |
clr.list.dom | This file is the result of CLR tests of domesticated yeast genomes (from PRJEB1973) by ANGSD and SweepFinder2 |
clr.list.wild | This file is the result of CLR tests of wild yeast genomes (from PRJEB1973) by ANGSD and SweepFinder |
fst | This file is the result of Fst tests of wild yeast genomes (from PRJEB1973) by ANGSD |
The following general pipeline can be utilized to perform population genetics analysis of 12 yeast resequencing genomes. For specifics, please refer to the manuals of Bowtie2, Samtools, ANGSD and SweepFinder2
1. Clean the download genome file
pure_chr.pl Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa sc_genome.fa
2. Perform alignment of fastq files and the reference genome
bowtie2-build sc_genome.fa index for i in *.fastq; do bowtie2 -p 8 -x ../index $i -S $i.sam; done; samtools faidx sc_genome.fa for i in *.sam; do samtools view -bt ../sc_genome.fa.fai $i > $i.bam; done; for i in *.bam; do samtools rmdup -sS $i $i.clean; done; for i in *.bam; do samtools sort $i -o $i.sorted; done; for i in *.sorted; do mv $i `echo $i | sed -e "s/\.fastq\.sam\.bam\.sorted/.bam/g"`; done;
3. Make two files dom.txt and wild.txt listing sorted bam files.
4. Perform Thetas,Tajima's D' and Neutrality tests
angsd -bam dom.txt -doSaf 1 -anc /genome/zhuzl/yeast/sc_genome.fa -GL 1 -P 24 -out out ~/bin/angsdfiles/angsd/misc/realSFS out.saf.idx -P 24 > out.sfs angsd -bam dom.txt -out outdom -doThetas 1 -doSaf 1 -pest out.sfs -anc /genome/zhuzl/yeast/sc_genome.fa -minMapQ 30 -minQ 20 -GL 1 misc/thetaStat do_stat outdom.thetas.idx misc/thetaStat do_stat outdom.thetas.idx -win 1000 -step 100 -outnames misc/thetaStat do_stat outdom.thetas.idx -win 1000 -step 100 -outnames theta.thetasWindow.gz
5. Call CLR
angsd -bam dom.txt -GL 1 -out out -doMaf 2 -minMapQ 30 -minQ 20 -minMaf 0.01 -fold 1 -P 12 -SNP_pval 0.05 -doMajorMinor 1 gunzip out.mafs.gz SweepFinder2 -sg 1000 dom.txt clr.list.dom SweepFinder2 -sg 1000 dom.txt clr.list.dom
6. Call FST
angsd -b dom.txt -anc ../sc_genome.fa -out pop1 -dosaf 1 -gl 1 angsd -b wild.txt -anc ../sc_genome.fa -out pop2 -dosaf 1 -gl 1 angsd -b list1 -anc hg19ancNoChr.fa -out pop1 -dosaf 1 -gl 1 angsd -b list2 -anc hg19ancNoChr.fa -out pop2 -dosaf 1 -gl 1 misc/realSFS pop1.saf.idx pop2.saf.idx >pop1.pop2.ml misc/realSFS fst index pop1.saf.idx pop2.saf.idx -sfs pop1.pop2.ml -fstout here misc/realSFS fst stats here.fst.idx misc/realSFS fst stats2 here.fst.idx -win 50000 -step 10000 >slidingwindow