Study hospital readmission risk to assess the impact of telehealth interventions on diabetic patients with the ultimate goal of reducing the 30-day readmission rate.
Description: The dataset includes over 100,000 hospital discharges of over 70,000 diabetic patients from 130 hospitals across the United States during the period 1999 - 2008 . All patients were hospital inpatients for 1 - 14 days, and received both lab tests and medications while in the hospital. The 130 hospitals represented in the dataset vary in size and location: 58 are in the northeast United States and 78 are mid-sized (100 - 499 beds).
- The dataset has total 101,766 observations of 45 variables. There are several Factors datatypes in the dataset; race, gender, age, admissionType, and admissionSource;
- The target variable to predict is readmission (INT). The datatype needs to be converted into Factor type to create a CART model for classification, not Regression.
- 75% of dataset will be used to train the model, and rest 25% will be used to evaluate the accuracy of the model.
Hospitals estimated that the cost of 30-day unplanned readmission and telehealth intervention are $35,000 and $1,200 respectively. From published information at a similar institution, telehealth interventions will reduce the incidence of 30-day unplanned readmissions in the treated population by 25%.
The costs for all cases can be defined as below:
- Cost of True Negative (TN): 0
- Cost of False Positive (FP): $1,200
- Cost of False Negative (FN): $35,000
- Cost of True Positive (TP): $1,200 + ($35,000 * 0.75) = $27,450
Based on the defined costs above, the values for creating the loss matrix are as below
This CART model seeks to minimize out-of-sample misclassification cost.
#CART model with cp=0.001
readm.mod = rpart(readmission ~ .,
data = readm.train,
method = "class",
cp=0.001)
prp(readm.mod)
#predict the readmission
pred <- predict(readm.mod, newdata = readm.test, type = "class")
#confusion matrix. first arg=row, second arg=col - ex. [row,col]
confusion.matrix = table(readm.test$readmission, pred)
confusion.matrix
#define the loss matrix for CART Model
LossMatrix = matrix(0,2,2)
LossMatrix[2,1] = 1200
LossMatrix[1,2] = 7550
#the final table:
LossMatrix
Screenshot of loss matrix in R
readm.mod.2 = rpart(readmission~.,
data = readm.train,
parms=list(loss=LossMatrix),
cp=0.001)
prp(readm.mod.2, digits=3)
Performance of 30-day unplanned readmissions using the test set.
Predictive Performance:
The column of table indicates the predicted values and the row indicates the actual values. The telehealth intervention is not practiced in Current Practice, hence the values in column ‘1’ are all set to ‘0’ as a default stage.
New Model predicts the number of patients in the column ‘1’ who are likely to readmit to hospital within the 30 days from the period of discharge. The predications are conducted based on the CART model that incorporates the cost of readmission and telehealth intervention defined in the loss matrix.
Monetary Cost Comparison:
Current Practice
- Total cost of readmission: 2,839 × $35,000 = $99,365,000
New Model
- Cost of intervention: (4,565 + 1,055) × $1,200 = $6,744,000
- Cost of readmission: ((1,055 × 0.75) +1,784) × $35,000 = $90,133,750
- Total cost: $96,877,750
With the anticipated cost of an unplanned readmission $35,000 and a telehealth intervention $1,200, the Current Practice generates the total cost of $99,365,000 for the cost of readmission alone. However, with the New Model developed to minimize the total estimated cost through the telehealth intervention:
- The total monetary cost is expected to reduce by $2,487,250, which is equivalent to 2.5% decrease from the initial amount yielded by the Current Practice
- The number of 30-days unplanned readmissions is also expected to reduce by 264, equivalent to 9.3% decrease from the Current Practice
#make predictions.
pred.2 = predict(readm.mod.2, newdata=readm.test, type="class")
#view the confusion matrix.
confusion.matrix.2 = table(readm.test$readmission, pred.2)
confusion.matrix.2
# Computation of accuracy, TPR and FPR for both models
accuracy <- sum(diag(confusion.matrix))/sum(confusion.matrix)
accuracy.2 <- sum(diag(confusion.matrix.2))/sum(confusion.matrix.2)
TPR <- confusion.matrix[2,2]/sum(confusion.matrix[2,])
TPR.2 <- confusion.matrix.2[2,2]/sum(confusion.matrix.2[2,])
FPR <- confusion.matrix[1,2]/sum(confusion.matrix[1,])
FPR.20 <- confusion.matrix.2[1,2]/sum(confusion.matrix.2[1,])
With the variation of the intervention cost by $200, the changes in the number of readmission and total monetary costs are shown as below.
The number of readmissions increases as the cost of intervention increases, because the higher cost of intervention defined in the loss matrix will likely to classify more people in the "don't intervene" area to minimize the total monetary cost, causing the rise in the number of readmissions.
The total monetary cost also increases as the cost of intervention increases, which is due to the rise of intervention cost itself as well as the increased costs by the rise in the 30-days unplanned readmissions number.
Note: To obtain a CART model with a higher TPR, update the loss matrix with a higher value (penalty) of the false negative (FN). To obtain a lower FPR, update the matrix with a lower value of the true negative (TN).
Data source:
The data are submitted on behalf of the Center for Clinical and Translational Research, Virginia Commonwealth University, a recipient of NIH CTSA grant UL1 TR00058 and a recipient of the CERNER data. John Clore (jclore '@' vcu.edu), Krzysztof J. Cios (kcios '@' vcu.edu), Jon DeShazo (jpdeshazo '@' vcu.edu), and Beata Strack (strackb '@' vcu.edu). This data is a de-identified abstract of the Health Facts database (Cerner Corporation, Kansas City, MO). URL: https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008