efg's Research Notes:  R TechNotes and Graphics Gallery

Venn Diagram

Earl F. Glynn
Stowers Institute for Medical Research
18 March 2005

 

Purpose

This TechNote explains how to form a Venn diagram from three different sets, such as three different sets of gene identifiers from a microarray experiment.

 

Background

Familiarity with simple set theory and Venn diagrams is assumed.

 

Software Requirements

R 2.0.1
limma Bioconductor package

 

Step-by-Step Procedure

 

  1. Tell R we need the limma package:

> library(limma)

  1. Define three sets of identifiers, and a set of additional identifiers that may be part of the "universe":

> set1 <- c("a","b","c","e", "f")

> set2 <- c("a","b","d")

> set3 <- c("a","c","d","e", "g", "h")

> extra <- c("x", "y", "z")

 

  1. Note the sets here are assumed to have unique members. If you do not have a list of unique elements, use the R unique function to get such a list. For example:
    > bag2 <- c("a", "b", "b", "b", "d") # element "b" repeated 3 times
    > set2 <- unique(bag2)
    > set2
    [1] "a" "b" "d"


    You have a "bag" not a "set" if all elements are not unique.

  2. The R functions intersect, union and setdiff can be used to manipulate sets. Let's use union to form the universal set:

> universe <- sort( union(set1, union(set2, union(set3, extra))) )

> universe

[1] "a" "b" "c" "d" "e" "f" "g" "h" "x" "y" "z"

 

The sort function here isn't strictly necessary, but without it the set may appear like this, with element "d" out of order:

> universe <- union(set1, union(set2, union(set3, extra)))

> universe

[1] "a" "b" "c" "e" "f" "d" "g" "h" "x" "y" "z"

 

Perhaps an easier approach to several unions is to use combine (c), unique and sort functions:

> universe <- sort( unique( c(set1, set2, set3, extra) ) )

> universe

[1] "a" "b" "c" "d" "e" "f" "g" "h" "x" "y" "z"

 

  1. Form a data structure, here called Counts, for use with the vennCounts function:

 

> Counts <- matrix(0, nrow=length(universe), ncol=3)

> colnames(Counts) <- c("set1", "set2", "set3")

 

> for (i in 1:length(universe))

> {

> Counts[i,1] <- universe[i] %in% set1

> Counts[i,2] <- universe[i] %in% set2

> Counts[i,3] <- universe[i] %in% set3

>}

 

> Counts

     set1 set2 set3

[1,]    1    1    1

[2,]    1    1    0

[3,]    1    0    1

[4,]    0    1    1

[5,]    1    0    1

[6,]    1    0    0

[7,]    0    0    1

[8,]    0    0    1

[9,]    0    0    0

[10,]   0    0    0

[11,]   0    0    0

 

  1. The vennDiagram and vennCounts functions can now be used to create the diagram:

> vennDiagram( vennCounts(Counts) )

 

 

Discussion

Venn3. The procedure above is reasonably simple, but a "wrapper" function that takes care of the details would be nice:

 

require(limma)

Venn3 <- function(set1, set2, set3, names)

{

  stopifnot( length(names) == 3)

 

  # Form universe as union of all three sets

  universe <- sort( unique( c(set1, set2, set3) ) )

 

  Counts <- matrix(0, nrow=length(universe), ncol=3)

  colnames(Counts) <- names

 

  for (i in 1:length(universe))

  {

    Counts[i,1] <- universe[i] %in% set1

    Counts[i,2] <- universe[i] %in% set2

    Counts[i,3] <- universe[i] %in% set3

  }

 

  vennDiagram( vennCounts(Counts) )

}

 

One line can now be used to create the Venn diagram from the sets given above:

 

> Venn3(set1, set2, set3, c("A", "B", "C"))

 

 

Venn2. A similar function can be used when working with only two sets:

 

require(limma)

Venn2 <- function(set1, set2, names)

{

  stopifnot( length(names) == 2)

 

  # Form universe as union of all three sets

  universe <- sort( unique( c(set1, set2) ) )

 

  Counts <- matrix(0, nrow=length(universe), ncol=2)

  colnames(Counts) <- names

 

  for (i in 1:length(universe))

  {

    Counts[i,1] <- universe[i] %in% set1

    Counts[i,2] <- universe[i] %in% set2

  }

 

  vennDiagram( vennCounts(Counts) )

}

 

One line is needed for this two-set Venn diagram:

 

> Venn2(set1, set2, c("Set A", "Set B"))

 

 

The zero in the bottom right corner is annoying. This zero indicates the number of members of the universe not contained in the sets represented by the circles. Suppressing this zero would be nice when we only care about the universe defined by the sets represented by the circles.

 

We can edit the vennDiagram function to get rid of this zero. Make a comment on the line shown below (put a "#" on the specified line):

 

> vennDiagram <- edit(vennDiagram)

 

 

File | Save

Click "X" in upper fight corner to exit editor

 

> Venn3(set1, set2, set3, c("A", "B", "C"))

 

 

The annoying zero is now gone. This change will only persist as long as the R workspace is maintained.


 

Related: 

 

Updated
24 Oct 2007