Correlograms Using R Studio - Visualize Your Correlations
Visualize!
Correlation?
This is a simple statistical parameter that allows us to establish a relationship between the variables taken for study and in plant breeder's term between the traits of interest. The result of the correlation analysis will show whether the traits are in positive or negative correlation with each other. So I am not going into detail about these. I will jump straight to the aspects of commands and R studio.
R Studio?
R studio is a user-friendly interface which was developed to easily execute R commands. Prior to R studio, commands were directly fed into the R console which is more complicated (For novice users like me) and requires a robust knowledge on the R language. At present, data analysis using R program is required by many established publishers and a researcher can play with his/her data at will, given we know the fundamentals.
Correlogram?
The correlogram is the graphical representation of our correlation data. Many R scripts are available online to perform correlogram and I have tried different scripts and tried to integrate the scripts to reach the desired outcome.
1. Install R program and R studio
- Install the appropriate R program suitable for your system. Keep in mind that R studio works only with R version above R 3.0.1.
- Link to download R program: https://cran.r-project.org/
- Now Install R studio, there are both paid and open source versions of R studio. Choose the required one. I am using the stand-alone free version.
- Link to download R studio: https://rstudio.com/products/rstudio/download/
- After installing open the R studio and set up your working directory by clicking the "session" tab from the main bar.
- I am not going to discuss about the other basic attributes and options of R studio other than the ones related to correlation. Many videos are also available which are pretty much easy to understand.
2. Data file format
- Type the data as above and mostly unreplicated data is used for correlation analysis (simple correlation) as phenotypic correlation alone is taken in correlogram.
- Save the file in excel format itself. R studio is compatible even with .xlsx extension.
3. Import data to Rstudio
- Upload the data and we are using data from excel so click on the option, as highlighted.
- As you select the data file, data preview will be shown. First click on the drop-down option on the genotype column, then click skip as we are not considering genotype as a variable herein correlation. Now click on the import button at the bottom right of the window.
4. R commands for correlogram - Standard and Mixed model
4.1. Install package corrplot and ggcorrplot:
install.packages("corrplot")
install.packages("ggcorrplot")
4.2. Call library corrplot (Opens the installed package):
library(corrplot)
4.3. Find correlation matrix
cor(nirmal)
cr<-cor(nirmal)
Note: "nirmal" is the excel file name. Every detail must be free from spelling mistake. This command will give you a correlation matrix in the R console. I am saving/naming the correlation matrix as "cr" for my convenience or it is easy to type in further commands
4.4. Standard model correlogram - Simple graph
corrplot(cr,method = "circle")
corrplot(cr,method="circle", type="full", order="original", col=NULL, bg="white", diag=TRUE, addgrid.col=NULL, addCoef.col="white", tl.pos = "lt", tl.cex = 0.8, tl.col=NULL, cl.pos = "r")
4.6. Mixed model correlogram - With two different methods on both sides of the diagonallibrary(ggcorrplot)
cor_pmat(nirmal)
p.mat<-cor_pmat(nirmal)
corrplot.mixed(cr, lower = "circle", upper = "number", p.mat = p.mat, insig = "blank", tl.pos = "lt", tl.cex = 0.8, tl.col = "black", diag = "n", bg="grey", addgrid.col = "grey")
5. Summarized form of commands
6. Keep in mind facts
- To run the command, type the command and press ctrl+enter or click on the "run" button present above R script.
- The methods which can be employed to represent the graph include "circle", "square", "ellipse", "number", "shade", "color", "pie". We can try different combinations as per our wish.
- In the model showing significance level, I have used the "blank" model where the insignificant values are erased. Other models for significance level include "n", "label_sig", "pch", "p-value".
- You can also try various colours and even if you encounter any error in commands, Rstudio gives you hints regarding the error and in many cases, it can show the correct commands also.
"Hope that was useful if u have any improvisations other than this feel free to comment or share"