Statistical Summary of CAMDA '04 Contest Datasets
| "Complete" Dataset | |
|---|---|
7091 |
Oligos in "Complete" dataset |
-216 |
Oligo_ID is "empty" |
6875 |
Unique Oligo_IDs |
-1265 |
"NULL" Gene ID
(No "annotations" are present but 690 had some comments) [Note: "NULL" string here is not the same as "null" in database terminology] |
-2 |
GeneID was missing (E9299_3 and N1) ["null" in database terminology] |
5608 |
Oligos with associated GeneIDs. 4314 unique GeneIDs. 35 genes have 5 or more oligos; |
| "Quality Control Set" | |
5071 |
Oligos in "Complete" set of 6875 unique oligos |
+9 |
"New" oligos not in "Complete" set: |
5080 |
Unique oligos in "Quality Control Set" Bozdech: "Fourier analysis was performed on each profile in the quality-controlled set (5081 oligonucleotides)." |
-555 |
"NULL" Gene ID (but 87 had some comments) |
4525 |
Oligos with associated GeneIDs. 3533 unique Gene IDs. 3529 unique Gene IDs are in the list of unique Gene IDs from the "Complete" set. The four new Gene IDs are: ITS2, P, Plastid Genome, snRNA? 24 genes have 5 or more oligos; Gene "Plastid Genome" represented by 27 oligos; Gene MAL6P1.147 represented by 15 oligos; Gene PFI1475w represented by 10 oligos. |
| "Overview" Dataset | |
3711 |
Oligos in "Complete" set of 6875 unique oligos |
+8 |
"New" oligos not in "Complete" set and introduced in the QC set: ptrgln, ptrgly, ptrgly2, ptrphe, ptrpro, ptrthr, ptrtrp, snr1 |
3719 |
Unique oligos in "Overview" dataset . All "Overview" oligos are in the QC dataset. |
-335 |
"NULL" Gene ID (but 56 had some comments) |
3384 |
Oligos with associated GeneIDs. 2687 unique Gene IDs Gene "P" in the QC Dataset was renamed to Gene "PF14_0338" in the Overview set. The annotation information for these three genes changed slightly between the QC dataset and the Overview dataset: PFE0040c, PF11_0358, PF14_0451. Note: Bozdech reports in the P. falciparum Transcriptome Overview: "The overview set represents 2714 unique ORFs (3395 oligonucleotides). An additional 324 oligonucleotides represent ORFs that are not currently part of the manually annotated collection." The caption to Figure 2 shows "transcriptional profiles for 2712 genes". |