Analysis of CDP data using R

Timothy C. Hain, MD • Page last modified: September 4, 2021Return to testing index

This page discusses plotting posturography scores by decade. We are discussing data in a CSV file, where each column is a posturography score, as dumped by the Neurocom Posturography utility software, and imported into Mysql.

The simple but useless approach is as follows:

CDP scatterplot

A scatterplot of posturography scores vs. age, gives you a pretty random looking set of dots. In order to make sense of this (to some extent anyway), one needs to compute means and SD by age.

Step 1: Load data

This is the R code that takes a table in csv format and converts it into a dataframe.

## this reads a dump from the query program into a dataframe called cdp.
cdp=read.csv("cdp_gainTC.csv", header=TRUE, skip=11, sep=",",stringsAsFactors=FALSE)

# Reformat
cdp$DOS_1 = as.Date(cdp$DOS_1, format="%m/%d/%Y")
cdp$DOS_2 = as.Date(cdp$DOS_2, format="%m/%d/%Y")
cdp$Birth = as.Date(cdp$Birth, format="%m/%d/%Y")
cdp$FirstDt = as.Date(cdp$FirstDt, format="%m/%d/%Y")
cdp$Age = as.numeric(format(cdp$DOS_1, "%Y")) - as.numeric(format(cdp$Birth, "%Y"))

cdp<-rename(cdp,"Composite" = "COMPOSITE_1") # Using dplyr to rename a dataframe name, new_name, old_name.
cdp$Composite[cdp$Age < 10]<-NA # Crazy ages, remove Com
cdp$Composite[cdp$Age > 90]<-NA # Too old
cdp$Age[cdp$Age < 10]<-NA # Crazy ages, remove Comp
cdp$Age[cdp$Age > 90]<-NA # Too old
cdp$Composite[cdp$Composite >100]<-NA
cdp$Composite[cdp$Composite <0]<-NA
cdp$decade<-as.integer(cdp$decade) # This causes the plot function to separate out the data by decade.

total <- sprintf("Posturography vs age: n=%d, Chicago Dizziness and Hearing", nrow(cdp))
means <- aggregate(Composite~decade, cdp, mean)


Step 2: Create a boxplot of CDP vs decade.

CDP by decade

This graph was produced using the following code:

plot(cdp$decade, cdp$Composite, col="coral", main=total, # main="Posturography vs Age", xlab="Decade", ylab="Composite score")