Correlograms Using R Studio - Visualize Your Correlations

Visualize!

    Ya, we have heard this term time and again in many motivational speeches. Those who can visualize, certainly possess a better shot at life. Similarly for us (we researchers) visualizing the research data will certainly make us stand out from other fellow mates who are probably still stuck with tables, especially this one attribute can really change your perspective and enables us to view our own data in a brand new angle. Don't forget to mention that our manuscripts can have that slight edge over others, probably. Anyway its really a cool way to present our work before an engaged audience. 

Correlation?

    This is a simple statistical parameter that allows us to establish a relationship between the variables taken for study and in plant breeder's term between the traits of interest. The result of the correlation analysis will show whether the traits are in positive or negative correlation with each other. So I am not going into detail about these. I will jump straight to the aspects of commands and R studio.

R Studio?

    R studio is a user-friendly interface which was developed to easily execute R commands. Prior to R studio, commands were directly fed into the R console which is more complicated (For novice users like me) and requires a robust knowledge on the R language. At present, data analysis using R program is required by many established publishers and a researcher can play with his/her data at will, given we know the fundamentals.  

Correlogram?

    The correlogram is the graphical representation of our correlation data. Many R scripts are available online to perform correlogram and I have tried different scripts and tried to integrate the scripts to reach the desired outcome.

1. Install R program and R studio

  • Install the appropriate R program suitable for your system. Keep in mind that R studio works only with R version above R 3.0.1.
  • Link to download R program: https://cran.r-project.org/
  • Now Install R studio, there are both paid and open source versions of R studio. Choose the required one. I am using the stand-alone free version.
  • Link to download R studio: https://rstudio.com/products/rstudio/download/
  • After installing open the R studio and set up your working directory by clicking the "session" tab from the main bar.
  • I am not going to discuss about the other basic attributes and options of R studio other than the ones related to correlation. Many videos are also available which are pretty much easy to understand.  

2. Data file format

  • Type the data as above and mostly unreplicated data is used for correlation analysis (simple correlation) as phenotypic correlation alone is taken in correlogram.
  • Save the file in excel format itself. R studio is compatible even with .xlsx extension.

 3. Import data to Rstudio

  • Upload the data and we are using data from excel so click on the option, as highlighted.
  • As you select the data file, data preview will be shown. First click on the drop-down option on the genotype column, then click skip as we are not considering genotype as a variable herein correlation. Now click on the import button at the bottom right of the window.

4. R commands for correlogram - Standard and Mixed model

4.1. Install package corrplot and ggcorrplot:

install.packages("corrplot")

install.packages("ggcorrplot") 

4.2. Call library corrplot (Opens the installed package):

library(corrplot)

4.3. Find correlation matrix

cor(nirmal)

cr<-cor(nirmal)

Note: "nirmal" is the excel file name. Every detail must be free from spelling mistake. This command will give you a correlation matrix in the R console. I am saving/naming the correlation matrix as "cr" for my convenience or it is easy to type in further commands

4.4. Standard model correlogram - Simple graph

corrplot(cr,method = "circle")


4.5. Standard correlogram - Graph with more specifications

corrplot(cr,method="circle", type="full", order="original", col=NULL, bg="white", diag=TRUE, addgrid.col=NULL, addCoef.col="white", tl.pos = "lt", tl.cex = 0.8, tl.col=NULL, cl.pos = "r")

4.6. Mixed model correlogram - With two different methods on both sides of the diagonal

corrplot.mixed(cr, lower = "circle", upper = "number", tl.pos = "lt", tl.cex = 0.8, tl.col = "black", diag = "n", bg="white", addgrid.col = "grey")

4.7. Mixed model correlogram - With significance

library(ggcorrplot)

cor_pmat(nirmal)

p.mat<-cor_pmat(nirmal)

Note: The library ggcorrplot can be called now or also at the start of the session. For including significance, additionally, we need probability matrix which is a function only available with "ggcorrplot" and not with "corrplot". Similar to the correlation matrix we find out the "p" matrix and for our convenience, we name it as "p.mat".   

corrplot.mixed(cr, lower = "circle", upper = "number", p.mat = p.mat, insig = "blank", tl.pos = "lt", tl.cex = 0.8, tl.col = "black", diag = "n", bg="grey", addgrid.col = "grey")

5. Summarized form of commands


6. Keep in mind facts

  • To run the command, type the command and press ctrl+enter or click on the "run" button present above R script.
  • The methods which can be employed to represent the graph include "circle", "square", "ellipse", "number", "shade", "color", "pie". We can try different combinations as per our wish.
  • In the model showing significance level, I have used the "blank" model where the insignificant values are erased. Other models for significance level include "n", "label_sig", "pch", "p-value".
  • You can also try various colours and even if you encounter any error in commands, Rstudio gives you hints regarding the error and in many cases, it can show the correct commands also.

"Hope that was useful if u have any improvisations other than this feel free to comment or share"

        



Previous
Previous

Scatterplot with different colours using R studio

Next
Next

Publishing Article - Predatory/Hijacked/UGC listed/Scopus/Web of Science