efg's Research Notes

 


FCSExtract Utility

for Flow Cytometry Standard (FCS) data
 
Earl F Glynn
Stowers Institute for Medical Research

 


Download (available under GNU General Public License): 
FCS Extract 1.02 (15 Feb 2006)   •  Test Data  •  Delphi 7 Source Code

Sample Output File Set:  .csv  and .txt

 

Also see R alternative to FCS Extract (6 Oct 2005)


Purpose
The purpose of the FCSExtract utility is to extract Flow Cytometry Standard (FCS) data from a binary FCS file and convert it to an ASCII text format, which can be read by a variety of software analysis packages.  An example of working with this data in R is provided.

Background
You may be frustrated if you attempt to analyze your flow cytometery data in a FCS file using any third-party software analysis tool. The FCS file is a binary file, and contains a number of non-ASCII text characters, which are unprintable, and unreadable by many programs.  You cannot just load a FCS file into Excel or other analysis tool easily.

The FCS binary data file follows a standard:

Reference Material

We only implemented the parts of the FCS standard needed for processing our files, but we recently (Feb 2006) extended the program to work with floating-point data in addition to integer data.   You could encounter problems with FCSExtract if your FCS files require parts of the standard that we ignored or only partially implemented.  We are not aware of any "test suite" for certification of software processing FCS files.

While the .FCS file extension is commonly used with flow cytometry data, there are other programs that define different .FCS files.  There are at least 9 programs that define FCS files, which are not the same as the flow cytometry files.   Be sure your version of Windows has the correct program association for .FCS files.

Software Requirements
Tested only using Windows 2000/XP, but should work on other versions of Windows.

Download the executable and run in any convenient directory.  No special installation is required.

Step-by-Step Procedure
1.  Double-click on the FCSExtract icon to start the program.

2.  Press the Read button, select a .fcs file, and press the Open button.


Example 1.  $DATATYPE = I for integer data

 

Example 2.  $DATATYPE = F for floating-point data

The contents of the "Text" portion of the .fcs file are displayed on the screen (see file format).

3.  A .txt file containing the information on the Text Segment tabsheet is automatically written to disk if AutoSave to .txt is checked.

4.  Press the Data Segment tabsheet to see the "raw data":


Example 1.  $DATATYPE = I for integer data

 

Example 2.  $DATATYPE = F for floating-point data
(Note String info is not available in this dataset)

 

Only the first 100,000 rows are shown on the screen.

5.  Return to the Text Segment tab sheet and press the Write button to create a .csv file with the same name as the .fcs file (as default):

Accept the default name, or specify another one.  Press the Save button.

6.  A batch of .fcs files can be converted in one step:  Select the Batch Extraction tabsheet.

Select the directory containing the input .fcs files.

By default, all the fields are selected to be processed.  Uncheck any undesired fields.  One or more files in the list can be selected for processing by the Process Selected button.   To process all files press the Process Batch button.

7.  View the .csv file in an ASCII editor, or in Excel (but Excel will only show the first 65,536 rows).

8.  The Setup tabsheet controls the names of the columns written to the .csv file.  Here's an example using the provided CC4_067_BM.fcs file of the possible column names:

9.  The resulting csv file (CC4_067_BM.csv in this case) can be processed by a variety of analysis program.  One good (and free) program for such analysis is R.  The following gives a very brief introduction of analyzing this data in R.

To load this CC4_067_BM.csv file into R, the following statement is needed at the R command prompt (>):

> fcs <- read.csv("CC4_067_BM.csv")

This creates a "data frame" called "fcs" that contains all the data in the .csv file.  To find the size of the dataframe, and the names of the columns, enter these commands:

> dim(fcs)
[1] 5634 10

> names(fcs)
[1] "Pulse.Width" "FSC" "SSC" "FITC" "PE" "PE.Cy5" "PE.Cy7" "APC"
[9] "APC.Cy7" "Raw.Time"

With only 5634 rows of data and 10 columns, this is a quite small flow cytometry dataset.

To see a histogram of the PE.Cy7 quantity, enter this command:

> hist(fcs$PE.Cy7, breaks=128,
     xlab="Channel", ylab="Count",
     main="Histogram (12-bit) of PE.Cy7")

This creates the histogram show at the upper-left below.  Note that while most flow cytometery software shows data in the range 0..255 (8-bit data), the full range of the data are usually 0..4095 (12-bit data).

A line-graph showing much the same information in the histogram can be created with these R statements:

> h <- hist(round(fcs$PE.Cy7), breaks=128,plot=FALSE)
> plot(h$mids, h$counts, type="l",
  xlab="Channel", ylab="Count",
  main="PE.Cy7")

The "h" object contains all the information in the histogram and it's plotted as a line graph using the R plot command (see upper right figure below):

The lower-left scatterplot (above) is from a single R statement:

> plot(fcs$FITC, fcs$PE.Cy7, pch=".", main="PE.Cy7 Vs FITC")

The frequency color density plot is a bit more difficult to create (lower right above).  The 12-bit data are first reduced to 8-bit data and a 256-by-256 matrix is formed and plotted with a particular color palette:

# Force from 0:4095 (12-bit) to 0:255 (8-bit)
i <- 1 + trunc( fcs$PE.Cy7 / 16 )
j <- 1 + trunc( fcs$FITC / 16 )

# Use 256-by256 matrix to form image
m <- matrix(0, 256, 256)

for (k in 1:length(i))
{
  m[j[k], i[k]] <- m[j[k], i[k]] + 1
}

image(m, col=c("white", rainbow(16)), axes=F,
  xlab="FITC", ylab="PE.Cy7",
  main="PE.Cy7 Vs FITC")

# Use a bit of a trick to display 0..256 instead of 0.0 .. 1.0
axis(SOUTH<-1, at=seq(0,1,length=9), labels=256*seq(0,1,length=9))
axis(WEST <-2, at=seq(0,1,length=9), labels=256*seq(0,1,length=9))

All of the above plots an be created with this single R statement:

> source("http://research.stowers-institute.org/efg/ScientificSoftware/Utility/FCSExtract/FCSsample.R")

The plots can be captured in a 2-by-2 matrix of plots in a PDF file, FCSsample.pdf, using these statements:

> pdf("FCSsample.pdf")
> par(mfrow=c(2,2,))

> source("http://research.stowers-institute.org/efg/ScientificSoftware/Utility/FCSExtract/FCSsample.R")
> dev.off()

Notice that PDF files, or windows metafiles if using Windows, can become quite large with flow cytometry datasets since the "normal" files are not bitmaps, but contain the drawing instructions instead. This small dataset created a 2 MB FCSsample.pdf.

A good alternative when working with large flow cytometry datasets is to create a PNG (portable network graphic) bitmap file instead of a PDF or metafile.  This is done by replacing the "pdf" device driver above with a "png" driver:

> png("FCSsample.png")
> par(mfrow=c(2,2,))

> source("http://research.stowers-institute.org/efg/ScientificSoftware/Utility/FCSExtract/FCSsample.R")
> dev.off()

The resolution of the resulting PNG File may not be quite as good as the PDF's result, but the FCSsample.png file is only 7 KB instead of 2 MB.

Revision History


R Alternative to FCSExtract (6 Oct 2005)

The recent book, Bioinformatics and Computational Biology Solutions Using R and Bioconductor, briefly discusses an alternative Bioconductor solution, which allows reading FCS files directly in R.  Section 5.3.3 of the book, "FCS format," briefly discusses the FCS file format, but I found the package facsDorit used the prada package, which did most of the work. [facsDorit was a bit hard to find.  It can be downloaded as part of the DFKZ FACS example data at the bottom of this page].   Section 5.4.1, "Visualization at the level of individual cells," shows the prada package, and how to fit a bivariate normal distribution to a 2D scatterplot by Florian Hahne. 

I have not yet tested the prada package directly with our data, but I modified their examples somewhat and think the bivariate fit may be a good way to select FCS data of interest.  Here's my R Code to exercise the prada package.


Updated
23 March 2006