Geeky Researcher

View Original

Variant Calling using UseGalaxy Server

Import Data

  • Open the use galaxy platform and register yourself with a free account. Now we can either import our own research data or use secondary data from various genomic archives such as ENA, NCBI-SRA, etc. https://usegalaxy.org/

  • For secondary data, we can browse from the platform itself. To the left-hand side (LHS), there is a “Get Data” option. Click on that.

  • We can see a list of databases where we can select the sequence reads either a single-end or paired-end read.

  • The reads must be in FastQ file format. Now here, I am going to select the third option which will aid me to get the required sequence from  NCBI-SRA.

  • Now a new dialogue box pops up, where we need to enter the SRR accession number.  We can separately search for the required gene sequence and its accession number from NCBI SRA.

  • https://www.ncbi.nlm.nih.gov/sra

  • Once, you enter the required information, proceed to click on execute. The FastQ file will appear on the right side history  tab.

  • Note that all the processing speed is dependent on your system specifications and also the internet connectivity.

  • Two separate files open up, if we are selecting paired-end reads then we need to perform a quality check on each individual FastQ file.

  • We can also click on the eye icon (red arrow) to view the sequence read data.

Quality Check

  • In the LHS, under the tools section, type in FASTQC.

  • FastQC is a function, which is used to check the quality of reads and by seeing this we can determine whether the reads are good enough for further analysis.

  • Click on the option “FastQC Read quality reports”

  • In the first box, select the FastQ file that is to be evaluated.

  • If we have adapter or contaminant list, we can add in the boxes below.

  • Let all other options be in default settings. Go to the end of the page and click on execute button.

  • A total of ten parameters is displayed in the FastQC report of which per base sequence quality, per base sequence content, and per sequence GC content are the most important ones.

  • These three parameters need to be in green color to proceed further.

  • Here, the per base sequence content is in red color. Then, we have to improve the quality of the reads through another function Trimmomatic.

  • Click on the per base sequence content (information about the ATGC content).

  • Here, we can see that in the initial stage, there are so many inequalities between A, T, G, and C. If we trim that portion, then we may have a quality read on our hands.

Trimming Reads

  • Search for the function Trimmomatic under the tools tab. Now, select the second option from the list.

  • At first box, select whether we are using single end or paired end reads.

  • Though we have taken paired end, now we are only going to trim a single file hence single end has been selected.

  • If we select paired end, then we can do trimming action for both the files simultaneously.

  • •As we scroll down, we can select the trimming operation that is to be carried out.

  • In FastQC report the problem only existed in the start of the reads, hence we have selected “cut bases off the start of a read”. Other options are also available according to ones purpose.

  • Once selected, click on the execute option.

  • The trimmed FastQ file will be available in the RHS.

  • We can again perform quality check to make sure the problems has been rectified.

  • If not, then re-sequencing must be carried out.

  • For further analysis, we have to use this trimmed file.

  • Similarly, the process must be done to the other FastQ file as we have taken paired end reads.

Mapping of Reads

  • Search “bwa” in the tools tab.

  • Two functions show up namely, Map with BWA and Map with BWA-MEM. The former is used if the read sequence is less than 100bp and the latter is used when the read sequence is more than 100bp.

  • We will select BWA-MEM

  • Select the reference genome to which the mapping needs to be done.

  • Below that, select whether it is paired-end or single-end.

  • Select the trimmed FastQ files in the subsequent boxes.

  • Select the analysis mode based on the sequencing platform used.

  • Click on execute

  • The mapped file will be in BAM file format in the RHS from where we can view the file.

  • BAM file includes the chromosome number for these short fragments of reads, their size and the corresponding sequences as given above.

Variant Calling

  • Search function FreeBayes and select from the option.

  • Select the BAM file as the input file for variant calling.

  • Again select the appropriate reference genome.

  • Set the parameter selection for simple diploid calling.

  • Click on execute option at the bottom

  • File is saved in VCF file format.

  • We can see information such as, the SNP variation between the reference genome and the sample, along with their position in the chromosome.

  • The same VCF file can be converted to GIF file to do Annotation (functional regions)  using another server namely, Wannovar.

See you again in another plantbreederspot content!!!!