Doctor AI: Predicting Clinical Events via Recurrent Neural NetworksDoctor AI: Predicting Clinical Events via Recurrent Neural NetworksChoi, Edward and Bahadori, Mohammad Taha and Sun, Jimeng2015
Paper summaryopenreviewThis paper presents an applications of RNNs to predict "clinical events", such as disease diagnosis and medication prescription and their timing.
The paper proposes/suggests:
1. Applying an RNN to disease diagnosis, medication prescription and timing prediction.
2. "Initializing" the neural net with skipgrams instead of one-hot vectors. However, it seems from the description that the authors are not "initializing", rather just feeding a different feature vector into the RNN.
3. Initializing a model that is to be trained on a small corpus from a model trained on a large corpus works. Concludes: information can be transferred between models (read across hospitals).
+ Diagnostic and drug code prediction on a subsequent visit using diagnostic codes, medications, procedures and date of previous visits.
+ Predict when the next visit to the doctor will happen.
+ Sutter Health Palo Alto Medical Foundation - primary care - case-control study for heart failure.
![Dataset](https://raw.githubusercontent.com/tiagotvv/ml-papers/master/clinical-data/images/Choi2016_dataset.png?raw=true "Dataset information")
+ Patients with fewer than two visits were excluded.
+ ICD-9 codes,
+ GPI drug codes
+ codes for CPT procedures
+ Records are time-stamped with the patient's visiting time.
+ If a patient receives multiple codes on the same visit, they all receive the same timestamp.
+ Granularity of codes - group subcategories:
+ ICD-9 3 digits: 1183 unique codes
+ GPI Drug class: 595 single groups
+ Target: y = [diagnosis, drug] - vector of 1183 + 595 = 1778 dimensions.
+ Gated Recurrent Units (GRU)
![GRU architecture](https://raw.githubusercontent.com/tiagotvv/ml-papers/master/clinical-data/images/Choi2016_gru.png?raw=true "Gated Recurrent Unit (GRU)")
+ The input vector x is one-hot encoded and has a dimension of 40000. The first layer tries to reduce dimensionality.
+ Two approaches to dimensionality reduction (embedding matrix W_emb)
+ W_emb is learned together with the model.
+ W_emb is pre-trained using techniques such as word2vec.
+ Loss function: cross entropy for codes + quadratic error for forecasting visits.
+ Prediction layer codes: Softmax / Prediction layer of the next time visit: ReLu.
#### Experiments and Results:
+ Code available on GitHub: https://github.com/mp2893/doctorai
+ Implementation in Theano - Training with 2 Nvidia Tesla K80 GPUs
+ Dataset split: 85% training, 15% test.
+ RNN trained for 20 epochs.
+ L2 regularization for both the vector of coefficients of the codes and for the vector of coefficients of the next visit (lambda = 0.001) - Dropout between GRU and prediction layer (and between GRU layers if there are more than 1).
+ 2000 neurons in the hidden layer
+ Frequency: The codes from the previous visit are repeated on the new visit. Good baseline for the case of patients whose condition tends to stabilize over time.
+ Top k most frequent codes from the previous visit.
+ Logistic Regression and Multilayer Perceptron. Uses the last 5 visits to predict the next.
+ top-k recall emulates the behavior of physicians when making a differential diagnosis
top-k recall = # of true positives in the top k predictions / number of true positives
+ R^2 used to evaluate the performance of the next visit prediction.
+ Predict logarithm of time duration between visits to reduce the impact of very long intervals.
+ RNN-1: RNN with a single hidden layer initialized with a random orthogonal matrix for W_emb.
+ RNN-2: RNN with two hidden layers initialized with a random orthogonal matrix for W_emb.
+ RNN-1-IR: RNN using a single hidden layer initialized embedding matrix w emb with the Skip-gram vectors trained on the entire dataset.
+ RNN-2-IR: RNN with two hidden layers initialized embedding matrix W_emb with the Skip-gram vectors trained on the entire dataset.
![Results](https://raw.githubusercontent.com/tiagotvv/ml-papers/master/clinical-data/images/Choi2016_results1.png?raw=true "Forecasting future medical activities")
+ Performance varies according to the number of patient visits:
+ Networks learn best when they observe more records.
+ Patients with frequent visits are sicker patients. In a way, it is easier to predict the future in these cases.
![Number of visits](https://raw.githubusercontent.com/tiagotvv/ml-papers/master/clinical-data/images/Choi2016_results2.png?raw=true "Doctor AI performance as it knows more about the patient")
+ Performance of Doctor AI in other datasets:
+ Potential to transfer knowledge accross hospitals. Pre-train Doctor AI on Sutter Health dataset and fine-tuned in MIMIC II dataset.
![Transfer knowledge](https://raw.githubusercontent.com/tiagotvv/ml-papers/master/clinical-data/images/Choi2016_results3.png?raw=true "Performance of Doctor AI in other datasets")
+ There is an interview about the paper at the [Data Skeptic](https://dataskeptic.com/blog/episodes/2017/doctor-ai) podcast.