Notes about CAMDA '04 Raw Data
The Complete and Overview datasets were downloaded from the CAMDA '04 site as a ZIP file. When uncompressed this ZIP became a datasets folder with three subfolders:
The files of interest in these folders were these tab-delimited text files:
These same files are available form the DeRisi Lab Malaria Transcriptome Database page. The data were manipulated with several tools, including an Access database, Excel spreadsheets, and the"R" analysis language. Access and Excel could read these tab-delimited files directly using defaults. In R I had problems parsing the file, and this appeared to be somehow related to the Name field in the Complete_Dataset.txt file. Instead of dealing with a more complicated way to parse the file in R, a new CSV file was created for use with R. To make analysis in R a bit easier, Excel was used to create a new file, CAMDA04-Complete_Dataset.csv, that had these modifications from the original file:
The "TP23" and "TP29" empty columns were added so time points from 1 to 48 could be treated in a consistent manner, since there were many other missing data points in the dataset. Download this zip file containing CAMDA04-Complete_Dataset.csv for use with the R code provided on this site. In the Access database the "Name" field was further parsed into separate fields of GeneID, Annotation, and Comments. |
Also see:
Statistical Summary of CAMDA '04 Contest Datasets