Scatterplot with different colours using R studio
Scatterplots
Scatterplots typically represent two variables in a set of data. We researchers use this plot to visualize the trend and behaviour of the variables. We, plant breeders also use these plots more often than not in our papers. I was going through a set of data where one of my colleagues instructed me to produce a scatter plot where he wanted to distinguish between the genotypes based on their tolerance level. A normal scatter plot was easy in R studio but giving different colours on the basis of their tolerance was a challenge for me (In big letters, I am a layman in R studio so it was difficult for me). I browsed all around the available examples and codes and combined those in a simple manner which may be useful for starters/beginners.
*Note: I have also written about visualizing correlation in my previous blog where I have given a brief introduction about R studio, installation and data file format. Hence, I would jump straight into the procedure here.
1. Data file format
- Type the data as above. Indicate the tolerance score or the abbreviation in your third column. Here, I have mentioned the column as "Tol".
- Save the file in excel format itself. R studio is compatible even with .xlsx extension.
2. Import data to Rstudio
- As you select the data file, data preview will be shown. Now click on the import button at the bottom right of the window.
3. R commands for scatter plot
3.1. Install package ggplot2:
3.2. Call the package ggplot2 (Opens the installed package):
library(ggplot2)
3.3. Assign variables to X and Y axis:
x<-scatter$Genotypes
y<-scatter$RS
*Note: "RS" and "Genotypes" are the name given in my data file and you can change as per your convenience. "Scatter" is the excel file name.
- "aes" is the aesthetic mapping function which can be employed to change how the variables in a dataset are mapped.
- The factor tolerance ("Tol") has been mentioned so that the plot can differentiate the classification with different colors. HT - Highly tolerant, T - Tolerant, MT - Moderately tolerant, S - Susceptible.
- We can also use any other factors (other than tolerance) to distinguish variables within the plot.