SWAV Sliding Window Analysis Viewer

System requirements

A suggested setting of the environment is a Linux server that has Apache 2.0 or later version, MySQL 5.0 to 5.2 , 5.3 or later version and PHP 5.0 or later version with mysqli module. Moreover, the environment need to fulfill the requirements to run BLAT and Blastn, e.g., the system needs to have libidn to support Blastn.

Preparation

Input files

File Format Number
Reference genome FASTA file 1 per organism
CDS FASTA file 1 per organism
Genome Annotation GTF or other format 1 per organism
Test statistics Tab separated format is preferred 1 or more per organism

Preparation of input files

Annotation

Users can generate the annotation mysql infile using Perl scripts listed below in the folder ‘install/scripts’,

File Format
GFF3 swav_annotation_format_change_gff3.pl
GTF swav_annotation_format_change_gtf.pl
swav_annotation_format_change_gff3.pl INFILE OUTFILE

or write scripts to change the source annotation file into the format as:

id chrtype startend strand name annotation

Note: ‘id’ starts from 1 and can be increased by 1 each time. 'type' needs to include 'RNA','exon','CDS' and 'intron'. 'chr', 'start', 'end', 'strand', 'name', 'annotation' denote chromosome/scaffold name, start postion, stop position, name the transcript/gene, functional annotation of the gene/transcript.

Reference genome

In the fasta format file of the reference genome, the title of each sequence needs to be clean.
For example, “>Chr1” is correct but “>Chr1 length=12341412” is incorrect.
The names of the chromosomes/scaffolds need a one-to-one map to ‘chr’ in the annotation mysql infile.

We offered a script to process chr file swav_chr_formatting.pl

swav_chr_formatting.pl ORIGINAL_GENOME_FASTA_FILE OUT_GENOME_FASTA_FILE

CDS File

In the fasta format file of the CDS, the title of each sequence needs to be clean.
For example, “>YBR024W” is correct but “>YBR024W_mRNA cds chromosome:R64-1-1:II:289445:290350:1” is incorrect.
The gene name needs a one-to-one map to ‘name’ in the annotation mysql infile.

We offered a script to process cds file swav_cds_formatting.pl

swav_cds_formatting.pl ORIGINAL_CDS_FASTA_FILE OUT_CDS_FASTA_FILE

Chromosome length

Users can generate the chromosome length mysql infile using the Perl script swav_chrlen.pl.

swav_chrlen.pl -i INFILE -o OUTFILE

Test statistics

The mysql infile of each test statistic need to follow the format:

idchrposval

Here, ‘id’ starts from 1 and is increased by 1 each time. ‘chr’ is chromosome. ‘pos’ is the center of the test window. ‘val’ is the value of test statistics. Users need to transform each test statistic into the mysql infile format.

Installation

Initialization

Unpack the source zip file swav.zip. The structure of the folder is

webthe web program of SWAV.
Note: the hypertext access file .htaccess is at the root. You need to check if "AllowOverride All" is in the configure file of Apache.
installscripts and sample files to install SWAV
swav.sqlthe sql file to initialize SWAV database

Place the content of “web” into the html folder of Apache or the appropriate position of the web service.

mv web /var/www/html

Configuration of database

Create a database named ‘swav’ in MySQL, execute ‘swav.sql’ to create basic tables and grant privileges of ‘swav’ to a account to master the database.
Change the config file ‘application/config/ database.php’ in the folder ‘web’. Specifically, find the following lines, and then set the content of each item as your setting in the mysql database for SWAV.

'hostname' => 'localhost or a IP address',
'username' => 'username to visit the MySQL database of SWAV',
'password' => 'password to visit the MySQL database of SWAV',
'database' => 'the name of newly generated SWAV database'

Setting the base URL

Users need to set the base URL at the file 'application/config/config.php' in the folder ‘web’. Find the following line as listed below and set it.

setting $config['base_url'] = 'The web address of your SWAV';

Uploading genome information

Visit your SWAV site, e.g. myswav.org. Go to ‘setting’ (myswav.org/setting). The initial username and password is ‘root’ and ‘swav’.
Go to “Genome->Organism” and add an organism, e.g., saccharomyces cerevisiae. The sign of an organism need to be alphabetic, in lower-case without blank spaces, e.g., ‘scerevisiae’ can be the sign of saccharomyces cerevisiae.

SWAV offers scripts (in the folder 'install/scripts') to faciliate users to upload data. Before running the scripts, users need set the database connection config file 'conf'.

$mysql_database="THE DATABASE NAME";
$mysql_user="THE USER WITH PRIVILEGES OF THE DATABASE";
$mysql_passwd="THE PASSWORD OF THE USER";

The user of the database needs to have the privilege 'FILE'. If the user is not 'root', users can use the following commands in the mysql ROOT administration to add 'FILE' privilege.

REVOKE ALL PRIVILEGES ON * . * FROM 'THE UESR NAME'@'localhost';
REVOKE GRANT OPTION ON * . * FROM 'THE UESR NAME'@'localhost';
GRANT FILE ON * . * TO 'THE UESR NAME'@'localhost' WITH MAX_QUERIES_PER_HOUR 0 MAX_CONNECTIONS_PER_HOUR 0 MAX_UPDATES_PER_HOUR 0 MAX_USER_CONNECTIONS 0 ;

Upload the annotation mysql infile using the script ‘upload_annotation.pl’

upload_annotation.pl -c DATABASE_CONFIG_FILE -i ANNOTATION_INFILE -s SIGN_OF_A_ORGANISM

Upload the annotation chromosome length infile using the script ‘upload_chr_len.pl’

upload_chr_len.pl -c DATABASE_CONFIG_FILE -i CHROMOSOME_LENGTH_INFILE -s SIGN_OF_A_ORGANISM

Users can view the status of newly generated tables in “Genome->Gene model” and “Genome->Chromosome Info”.

To enable the search function, users need to move the fasta format genome file to the position 'res/genomes' in the web folder and rename the file in the format of [sign of the organism].fa

mv genome.fa web/res/genomes/[sign of the organism].fa

To move the fasta format CDS file to the position 'res/cds' in the web folder, rename the file in the format of [sign of the organism].fa, and make blast use database.

mv cds.fa web/res/genomes/[sign of the organism].fa
../programs/makeblastdb -dbtype nucl -in [sign of the organism].fa -out [sign of the organism].fa

Add test statistics

Users can add a track in “Test statistics->Track list”. A single track indicates only one test statistic, while composite track indicates more than one test statistics.
If users newly set a composite track, they need to add subtracks in “Test statistics->Subtrack”.
The sign of the track needs to be alphabetic in lower-case and without blank spaces. The full name will be displayed in the main panel of SWAV. Track color is the color of the curves and points in the drawing of the test statistics. Track height is the height of the track image. If more than one track exists, ‘Rank’ will adjust their order from high to low.
Users need to upload track data to the tables listed in “Test statistics->Data”. The script ‘upload_indicator.pl’ in the folder ‘prepare’ functions to perform the uploading.

upload_tracks.pl -c DATABASE_CONFIG_FILE -i TRACK_FILE -t TABLE_NAME -p PARAMETER_STRING 
The format of PARAMETER_STRING is [the first line to get information]:[column of chromosome]:[column of position]:[column of track value]
e.g., upload_tracks.pl -c conf -i theta.dom -t indicator_scerevisiae_theta_dom -p 2:1:2:3

In “Test statistics->Cutoffs”, users can set lines of cutoffs, which will be displayed in the main page.

the script get_cutoff.pl can help to obtain a designated cutoff for ranking test.

get_cutoff.pl -i TRACK_FILE -c CUTOFF_PERCENTAGE -p PARAMETER_STRING 
The format of PARAMETER_STRING is [the first line to obtain information]:[column of track value]
The cutoff percentage is the percentage of track values in descending order 
e.g., get_cutoff.pl -i theta.dom -c 0.05 -p 2:3

“Test statistics->Chg Password” is the section to change the password of user ‘root’.

Functions

Browser

On the board, clicking on one gene model or one point of some test statistics, its related information is displayed.
Each test statistic may have one track (single track) or more multiple tracks (multi-track). For multitrack, track names are numbered and listed following the test statistic name.
Users can set one or more cutoffs and display the cutoff line in the main pane.

Button/Inputs Function
Choose an organism
Set the position to view
Direct to the view of selected organism and inputted position
Move the genome position to the left
Move the genome position to the left quickly
Move the genome position to the right quickly
Move the genome position to the right
Zoomin the view
Zoomout the view
This button enables a user to set the focus bar. In the pop-up dialog, users can choose to draw a focus bar range from position X to position Y or not to draw a focus bar range. Users can set the color, position and width of the bar.
This button enables a user to export the drawing of one track. In its pop-up dialog, users can choose to include or exclude gene annotation, cutoff line or fitting curves, and to set the size of the generated image. The size of the image need to be larger than 1000*600. After clicking “Preview & Download”, a new page to preview the drawing is generated. Clicking one "Download" will start to download the image.
Export one test statistic in the view in csv format

Search

BLAT/Blast Users can BLAT a sequence or a list of sequences in fasta format to the genome of selected organism and view tracks in the region mapped to one query sequence. The query sequence will be in the color orange in the browser. Users can choose to include the query sequence in the generated image.
By Positions After inputting a list of genome sites, users can obtain a list of links to view the tracks at inputted sites.
By Gene names Users input a list of gene/transcript names and obtain a list of links to view the tracks of the list of genes

Sample

Files

Files in the folder 'install/samples/' help users to build their SWAV.

File Description
scerevisiae.gff3 The GFF3 format file of Saccharomyces cerevisiae from Ensembl
scerevisiae.gtf The GTF format file of Saccharomyces cerevisiae from Ensembl
scerevisiae_genome.fa The genome of Saccharomyces cerevisiae from Ensembl
theta.dom This file is the result of Thetas, Tajima's D and Neutrality tests of domesticated yeast genomes (from PRJEB1973) by ANGSD
theta.wild This file is the result of Thetas, Tajima's D and Neutrality tests of wild yeast genomes (from PRJEB1973) by ANGSD
clr.list.dom This file is the result of CLR tests of domesticated yeast genomes (from PRJEB1973) by ANGSD and SweepFinder2
clr.list.wild This file is the result of CLR tests of wild yeast genomes (from PRJEB1973) by ANGSD and SweepFinder
fst This file is the result of Fst tests of wild yeast genomes (from PRJEB1973) by ANGSD

Pipeline

The following general pipeline can be utilized to perform population genetics analysis of 12 yeast resequencing genomes. For specifics, please refer to the manuals of Bowtie2, Samtools, ANGSD and SweepFinder2

1. Clean the download genome file

pure_chr.pl Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa sc_genome.fa

2. Perform alignment of fastq files and the reference genome

bowtie2-build sc_genome.fa index
for i in *.fastq; do bowtie2 -p 8 -x ../index $i -S $i.sam; done;
samtools faidx sc_genome.fa
for i in *.sam; do samtools view -bt ../sc_genome.fa.fai $i > $i.bam; done;
for i in *.bam; do samtools rmdup -sS $i $i.clean; done;
for i in *.bam; do samtools sort $i -o $i.sorted; done;
for i in *.sorted; do mv $i `echo $i | sed -e "s/\.fastq\.sam\.bam\.sorted/.bam/g"`; done;

3. Make two files dom.txt and wild.txt listing sorted bam files.

4. Perform Thetas,Tajima's D' and Neutrality tests

angsd -bam dom.txt -doSaf 1 -anc /genome/zhuzl/yeast/sc_genome.fa -GL 1 -P 24 -out out
~/bin/angsdfiles/angsd/misc/realSFS out.saf.idx -P 24 > out.sfs
angsd -bam dom.txt -out outdom -doThetas 1 -doSaf 1 -pest out.sfs -anc /genome/zhuzl/yeast/sc_genome.fa -minMapQ 30 -minQ 20 -GL 1
misc/thetaStat do_stat outdom.thetas.idx
misc/thetaStat do_stat outdom.thetas.idx -win 1000 -step 100  -outnames 
misc/thetaStat do_stat outdom.thetas.idx -win 1000 -step 100 -outnames theta.thetasWindow.gz

5. Call CLR

angsd -bam dom.txt -GL 1 -out out -doMaf 2 -minMapQ 30 -minQ 20 -minMaf 0.01 -fold 1 -P 12 -SNP_pval 0.05 -doMajorMinor 1
gunzip out.mafs.gz
SweepFinder2 -sg 1000 dom.txt clr.list.dom
SweepFinder2 -sg 1000 dom.txt clr.list.dom

6. Call FST

angsd -b dom.txt  -anc ../sc_genome.fa -out pop1 -dosaf 1 -gl 1
angsd -b wild.txt  -anc ../sc_genome.fa -out pop2 -dosaf 1 -gl 1
angsd -b list1  -anc hg19ancNoChr.fa -out pop1 -dosaf 1 -gl 1
angsd -b list2  -anc hg19ancNoChr.fa -out pop2 -dosaf 1 -gl 1
misc/realSFS pop1.saf.idx pop2.saf.idx >pop1.pop2.ml
misc/realSFS fst index pop1.saf.idx pop2.saf.idx -sfs pop1.pop2.ml -fstout here
misc/realSFS fst stats here.fst.idx 
misc/realSFS fst stats2 here.fst.idx -win 50000 -step 10000 >slidingwindow				
Copyright@ 2018-2023    Any Comments and suggestions mail to:zhuzl@cqu.edu.cn