Understanding CAMDA '06 Data

Publications  •  Clinical Data   •  Gene Expression  •   Proteomics Data  •   SNP Data


Clinical Data.  Four files were in the clinical_data.zip:

"MFI original paper.pdf" is also included in the publications.zip and describes the survey results.

Answers from the 49-page "Self Administered Questionairre.doc" are in the Excel file, "Illness Classificatoin SF36 MFI and Symptons.xls". (Note the spelling error in "classification" in the filename.)  The first lineof tabsheet "Class Demo" was blank and was deleted to simplify processing.

This "illness" spreadsheet can be considered the master patient file, since it contains demographic information about the patient (gender, age, ...), as well as clinical data from the surevey.

An R program, Clinical.R (updated 11 Jan 2006), was used to produce a list of the fields in this spreadsheet (see below).  A separate R program, descriptivestats.R, gave some general statistics and charts describing these fields.

# 227 rows by 88 columns
# Patient ID is "ABTID", which can be used as a database key
> print(names(illness))

 [1] "ABTID"  [Patient ID]            "Intake Classific"
 [3] "Empiric"                        "CLUSTER"
 [5] "Onset"                          "Yrs Ill"
 [7] "sex"                            "age"
 [9] "DOB"                            "race"
[11] "ethnic"                         "BMI"
[13] "Exclusion"                      "MDDM Current"
[15] "MDD Current"                    "Phys_Funct"
[17] "Role_Physic"                    "Bodily_Pain"
[19] "Gnrl_Hlth"                      "Vitality"
[21] "Social_Funct"                   "Role_Emotional"
[23] "Mental_Hlth"                    "Gen Fat"
[25] "Phys Fat"                       "Activ Reduc "
[27] "Motiv Reduc "                   "Mental Fat"
[29] "Sore Throat"                    "Freq Sore Throat"
[31] "Severity Sore Throat"           "Tender Nodes"
[33] "Freq Nodes"                     "Severity Nodes"
[35] "Diarrhea"                       "Freq Diarrhea"
[37] "Severity Diarrhea"              "Post Exertion Fatigue"
[39] "Freq Post Exertion Fatigue"     "Severity Post Exertion Fatigue"
[41] "Muscle Pain"                    "Freq Muscle Pain"
[43] "Severity Muscle Pain"           "Joint Pain"
[45] "Freq Joint Pain"                "Severity Joint Pain"
[47] "Fever"                          "Freq Fever"
[49] "Severity Fever"                 "Chills"
[51] "Freq Chills"                    "Severity Chills"
[53] "Unrefreshing Sleep"             "Freq Unrefreshing Sleep"
[55] "Severity Unrefreshing Sleep"    "Sleep Problems"
[57] "Freq Sleep Problems"            "Severity Sleep Problems"
[59] "Headache"                       "Freq Headache"
[61] "Severity Head Ache"             "Memory"
[63] "Freq Memory"                    "Severity Memory"
[65] "Concentration"                  "Freq Concentration"
[67] "Severity Concentration"         "Nausea"
[69] "Freq Nausea"                    "Severity Nausea"
[71] "Abdominal Pain"                 "Freq Abdominal Pain"
[73] "Severity Adominal Pain"         "Sinus Nasal"
[75] "Freq Sinus Nasal"               "Severity Sinus Nasal"
[77] "Shortness of breath"            "Freq Shortness of breath"
[79] "Severity Shortness of breath"   "Photophobia"
[81] "Freq Photophobia"               "Severity Photophobia"
[83] "Depression"                     "Freq Depression"
[85] "Severity Depression"            "F86"
[87] "F87"                            "F88" 

9 January 2006 Addition:

NEW paper explains fields in red above in general statistics:
Chronic fatigue syndrome - a clinically empirical approach to its definition and study
William C. Reeves, Dieter Wagner, Rosane Nisenbaum, James F. Jones, Brian Gurbaxani, Laura Solomon, Dimitris A. Papanicolaou, Elizabeth R. Unger, Suzanne D Vernon1, Christine Heim
BMC Medicine 2005, 3:19 (15 December 2005)   Provisional PDF

"Intake Classific" field (described in "Methods", p. 2 of this paper -- also see Table1).  Values deduced by matching frequency values and description in paper.

Intake Classific
Frequency
Description
Ever CFS
58
ever chronic fatigue syndrome (CFS)
Ever CFS-MDDm
27
"CFS accompanied by melancholic depression"
Ever ISF
59

"insufficient symptoms or fatigue"

"persons with medically unexplained fatigue not CFS, which we term ISF"

Ever ISF-MDDm
28
"ISF plus melancholic depression"
Nonfatigued
55
"non-fatigued controls matched to CFS on sex, race, age and body mass index"

"Cluster" field (described on p. 12 and Table 3, p. 24 of this paper) from 2-step cluster analysis.

Cluster
Frequency
Description
Worst
30
Most Severe ("lowest SF-36, highest MFI")
Middle
67
Intermediate
Least
67
Least Severe ("scores essentially reflected population norms.")

11 January 2006:  The  Clinical.R program was modified to create a file, NoExclusions.csv, which lists the 164 IDs from these three clusters along with the cluster name, i.e., Least, Middle, Worst. This file was used to assign the cluster name to each of the microarray data files. See Gene Expression page.

The other Excel file with clinical data, "Complete Blood Evaluation.xls", contains a "Blood Profile" tabsheet.  The Clinical.R program (requires RODBC package) also listed the fields in this spreadsheet (see below).   A separate R program, descriptivestats.R, gave some general statistics and charts describing these fields.

# 227 rows by 71 columns
# Patient ID is "ABTID", which can be used as a database key
> print( names(blood) )

 [1] "ABTID"  [Patient ID]             "Collection Date Blood"
 [3] "Collection Time Blood"           "WBC"
 [5] "WBC alert flag"                  "RBC"
 [7] "RBC alert flag"                  "HGB"
 [9] "HGB alert flag"                  "HCT"
[11] "HCT alert flag"                  "MCV"
[13] "MCV alert flag"                  "MCH"
[15] "MCH alert flag"                  "MCHC"
[17] "MCHC alert flag"                 "RDW "
[19] "RDW alert flag"                  "PLT"
[21] "PLT alert flag"                  "% granulocytes"
[23] "% granulocytes alert flag"       "% lymphocytes"
[25] "% lymphocytes alert flag"        "% mononuclear cells"
[27] "% mononuclear cells alert flags" "% eosinophils"
[29] "% eosinophils alert flags"       "% basophils"
[31] "% basophils alert flag"          "# granulocytes"
[33] "# granulocytes alert flag"       "# lymphocytes"
[35] "# lymphocytes alert flag"        "# mononuclear cells"
[37] "# mononuclear cells alert flag"  "# eosinophils"
[39] "# eosinophils alert flags"       "# basophils"
[41] "# basophils alert flag"          "sodium"
[43] "sodium alert flag"               "potassium"
[45] "potassium alert flag"            "chloride"
[47] "chloride alert flag"             "CO2"
[49] "CO2 alert flag"                  "anion gap"
[51] "anion gap alert flag"            "glucose"
[53] "glucose alert flag"              "BUN "
[55] "BUN alert flag"                  "creatinine"
[57] "creatinine alert flag"           "total protein"
[59] "total protein alert flag"        "albumin"
[61] "albumin alert flag"              "calcium"
[63] "calcium alert flag"              "bili total"
[65] "bili total alert flag"           "AST/SGOT"
[67] "AST/SGOT alert flag"             "ALT/SGPT"
[69] "ALT/SGPT alert flag"             "alk phos"
[71] "alk phos alert flag"           

I am not aware of any additional descriptions of these data fields.  These should be unambiguously defined or conclusions based on them may not be valid. Unfortunately, the utility of these fields may be limited without a better definition.


Last Updated
11 Jan 2006