APPLICATION OF AI IN KIDNEY TRANSPLANTATION
The first paper using AI techniques for the evaluation of allografts, based on imaging techniques, was that of Hamilton et al. The authors used 99mTc-MAG3 captopril renography to evaluate the presence of renal artery stenosis in the allograft. The authors used a neural network-based classifier, and their gold standard was arteriogram. Following the training of the neural network, they found that an accuracy of 95% could be achieved.
Some other papers also used AI techniques for the radiological evaluation of allografts with the aim of diagnosing acute rejection. El-Baz et al investigated the early detection of acute rejection using dynamic contrast-enhanced magnetic resonance imaging (MRI). The researchers automated data acquisition from the MRI using a three-step algorithmic approach and this data feed was linked to a Bayesian supervised classifier to diagnose acute rejection. The authors also studied motion correction models to account for the local motion of the kidney due to patient moving and breathing. Then, they used the perfusion curves to feed the Bayesian supervised classifier with the aim of distinguishing normal and acute rejection.
Three additional papers from the same group examined the utility of computer-aided diagnostic (CAD) systems for the diagnosis of acute rejection[7-9]. In their first study, the authors used deep-learning algorithms, namely, ‘stacked non-negative constrained auto-encoders’, for the prediction of acute rejection. Their data feed was the outcomes of diffusion-weighted MRI (DW-MRI). In their second study, in addition to DW-MRI, creatinine clearance and creatinine values were also used for the data feed of convolutional neural network (CNN) based classifiers. In both papers, the overall accuracy for correct diagnosis of acute rejection was above 90%. The authors proposed that their results demonstrated the potential of this new CAD system to reliably diagnose renal transplant rejection.
In a third study, they again assessed the utility of the CAD system for the diagnosis of acute rejection using DW-MRI and blood oxygen level-dependent MRI as the image-based sources. The authors also used laboratory data consisting of creatinine and creatinine clearance. In addition, they utilized a deep learning-based classifier, namely, ‘stacked autoencoders’, to differentiate non-rejection from acute rejection in renal transplants. The overall accuracy of the CAD system in detection of acute rejection was around 90%.
AI applications have also been used to assess allograft biopsies, where data feed for the classification algorithms was histological findings, molecular biomarkers, or a combination of the two.
Kazi et al used 12 histological features to train a Bayesian network with 110 transplant biopsies. Using the Bayesian network, a relatively inexperienced pathologist was able to make the correct diagnosis in 19 out of 21 cases. The researchers suggested that the integration of data with a computer can give a more consistent diagnosis of early acute rejection. In a follow-up study, the same researchers used a simple neural network for the decision process and the authors pointed out that in Bayesian networks the ‘importance’ attached to each histological feature had to be calculated and programmed into the network at the onset and because of this approach, they have the disadvantage of relative inflexibility. A neural network has the potential of greater flexibility, because the process of ‘training’ a neural network would automatically calculate what ‘weight’ should be allocated to each histological feature. The authors used 12 histological features, 100 transplant biopsies (43 with definite rejection), and 25 additional cases to train a single-layer simple neural network. Eventually, the network was able to correctly classify 19 out of the 21 new cases, leading to the conclusion that neural network technology can dramatically improve the accuracy in histological diagnosis of early acute renal allograft rejection.
Marsh et al used deep learning algorithms to evaluate intraoperative donor kidney biopsies with the aim of determining which kidneys were eligible for transplantation. The authors used CNNs as a deep learning algorithm. The primary advantage of CNN is that the models can automatically discover prominent features from the data alone, without requiring a set of handcrafted parameters and extensive input normalization. Most recently, CNNs have been explored as primary tools for glomeruli detection. Different models were shown to be able to differentiate image patches containing isolated normal glomeruli from non-glomerular structures. Marsh et al trained the network with a total of 870 sclerosed and 2997 non-sclerosed glomeruli that were labeled. The images were acquired from hematoxylin and eosin (HE)-stained frozen wedge donor biopsies. The fully conventional model in the study showed a high correlation with percent global glomerulosclerosis (R2 = 0.828). The authors concluded that the performance of the CNN alone was equivalent to that of a board-certified clinical pathologist.
Liu et al examined the diagnosis of T-cell-mediated kidney rejection using a data feed acquired by RNA sequencing. The authors used three ML methods called linear discriminant analysis (LDA), SVM, and random forest (RF). The molecular signature discovery data set involved five kidney transplant patients with T-cell-mediated rejection (TCMR) and five with stable renal function. The forecast models were tested on 703 biopsies with Affymetrix GeneChip expression profiles available in the public domain. The LDA predicted TCMR in 55 of the 67 biopsies labeled TCMR, and 65 of the 105 biopsies designated as antibody-mediated rejection (ABMR). The RF and SVM models showed comparable performances. These data illustrated the feasibility of using RNA sequencing for molecular diagnosis of TCMR.
Halloran et al and Reeve et al used molecular microscopy techniques to evaluate allograft biopsies, including molecular phenotyping with platforms such as microarrays that measure the expression of thousands of genes. To express the likelihood that particular diseases are present in the biopsy, the authors developed the TCMR score and the ABMR score assigned by classifiers (using weighted equations) built by standard ML methods. The authors also developed the Molecular Microscope Diagnostic System (MMDx) that assesses the TCMR and ABMR in a reference set of biopsy samples using ML-derived classiﬁer algorithms. Archetypal analysis and an additional 12 ML methods (individually or in ensembles) were used during the development of the MMDx. Archetype analysis is a probabilistic data-driven unsupervised statistical approach that categorizes separate groups of patients (archetypes). The ensembles made diagnoses that were both more accurate than the best individual classifiers and almost as stable as the best, in line with the previous studies from the ML literature. Human experts had about 93% agreement (balanced accuracy) signing out the reports, while RF-based automated sign-outs showed similar levels of agreement (92% and 94% for predicting the expert MMDx sign-outs for TCMR and ABMR, respectively).
In 451 biopsy samples where a feedback was obtained, clinicians indicated that the MMDx agreed more commonly with the clinical decision (87%) than histology (80%) (P = 0.0042). In another study, the same group of researchers explored the frequency of rejection in areas of interstitial fibrosis and tubular atrophy (i-IFTA) in kidney transplant biopsies by using histology Banff 2015 and an MMDx and concluded that i-IFTA in indication biopsies reflected current parenchymal injury, often with simultaneous ABMR but seldom with TCMR.
Hermsen et al used whole-slide images of stained kidney transplant biopsies to develop and validate a CNN for histologic analysis in renal tissue stained with periodic acid Schiff. The researchers assessed the segmentation performance for different tissue classes and found that the best-segmented class was “glomeruli”, followed by “tubuli combined” and “interstitium”. The network detected 92.7% of all glomeruli in nephrectomy samples, with 10.4% of false positives. The authors also suggested that the CNN may have utility for quantitative studies involving kidney histopathology across.
Aubert et al used archetype analysis to identify distinct groups of patients with transplant glomerulopathy. The researchers examined data from 552 biopsy samples taken from 385 patients with transplant glomerulopathy, using unsupervised archetypal analysis that integrated clinical, functional, immunologic, and histologic parameters. The authors identified five archetypes with distinct clinical, histologic, and immunologic features, as well as different outcomes (kidney allograft survival rates). The authors suggested that their approach permitted to decrease patient heterogeneity and created meaningful groups in terms of morphologic patterns, disease activity/progression, and risk of failure.
Kim et al used a fully automated system using CNN to identify regions of interest and to detect C4d positive and negative peritubular capillaries in gigapixel immune-stained slides. The authors used deep-learning-assisted labeling to enhance the performance of the detection method. Using this approach, they were able to train the CNN with a small number of samples. They suggested that their system was highly reliable, efficient, and effective for the detection of renal allograft rejection.
Finally, Ligabue et al evaluated the role of a CNN as a support tool for kidney immunofluorescence reporting and found that CNNs were 117 times faster than human inspectors in analyzing 180 test images. The accuracy of the CNN was comparable with that of experienced pathologists in the field.
Simic-Ogrizovic et al used data from 27 patients and 33 variables to train an ANN to predict chronic rejection progression, and suggested that ANN seemed more reliable in the prediction of the chronic rejection course than the usual statistical methods.
Lin et al examined single time-point models (LR and single-output ANNs) vs multiple time-point models (Cox models and multiple-output ANNs) to predict kidney transplant outcomes. The authors concluded that single time-point and multiple time-point models can achieve comparable area under the curve (AUC), except for multiple-output ANNs, which may perform poorly when a large proportion of observations are censored. LR can achieve similar performance as ANNs if there are no strong interactions or non-linear relationships among the predictors and the outcomes.
Akl et al developed an ANN model to predict the 5-year graft survival in living-donor kidney transplants. Estimates from the validated ANNs were compared using Cox regression-based nomograms. Researchers used data from 1581 patients for training and 319 patients for validation. The positive predictive value of graft survival was 82.1% and 43.5% for the ANNs and Cox regression-based nomogram, respectively. The authors concluded that ANNs were more accurate and sensitive than the Cox regression-based nomogram in predicting 5-year graft survival.
Lofaro et al used two different classification trees to predict chronic allograft nephropathy (CAN) within 5 years after transplantation by evaluating 80 renal transplant patients’ routine blood and urine tests collected after 6 mo of follow-up, and concluded that the use of classification trees is an acceptable alternative to traditional statistical models, especially for the evaluation of interactions of risk factors.
Greco et al also used DTs to build predictive models of graft failure and retrospectively studied 194 renal transplant patients with 5 years of follow-up. The primary endpoint was graft loss within 5 years of follow-up. In the classification algorithm, the researchers studied the following parameters: Age, gender, time on dialysis, donor type, donor age, human leukocyte antigen (HLA) mismatches, delayed graft function (DGF), acute rejection episode, CAN, and body mass index and concluded that the use of DTs in clinical practice may be an acceptable alternative to the traditional statistical methods.
For the evaluation of the 3-year graft survival in kidney recipients with systemic lupus erythematosus (SLE), Tang et al applied classification trees, LR, and ANNs to the data describing kidney recipients with SLE retrieved from the United States Renal Data System database. The 95% confidence interval of the area under the receiver-operator characteristic curve (AUROC) was used to quantify the discrimination capacity of the prediction models. The authors concluded that the performance of LR and classification trees was not inferior to that of more complex ANN.
Yoo et al assessed the predictive power of ensemble learning algorithms [survival DT, bagging, RF, and ridge and least absolute shrinkage and selection operator (LASSO)] and compared their outcomes to those of the conventional models (DT and Cox regression) to predict graft survival in a retrospective analysis of the data from a multicenter cohort of 3117 kidney transplant recipients. By means of a survival DT model, the index of concordance was found as 0.80, with the episode of acute rejection during the 1-year post-transplant being associated with a 4.27-fold increase in the risk of graft failure. In conclusion, the authors reported that ML methods may provide flexible and practical tools for predicting graft survival.
In a cross-sectional study, Nematollahi et al examined the 5-year graft survival in 717 patients, using a multilayer perceptron of ANN (MLP-ANNs), LR, and SVMs to construct prediction models. The authors assessed the validity of the models using different evaluation tools such as AUC, accuracy, sensitivity, and specificity and concluded that the SVM and MLP-ANN models could efficiently be used for survival prediction in kidney transplant recipients.
Tapak et al compared the LR and ANN approaches to predict graft survival in their data set from a retrospective study of 378 patients. According to their analysis, the ANN model outperformed LR in the prediction of kidney transplantation failure. The ANN model showed a higher total accuracy (0.75 vs 0.55) and better area under the ROC curve (0.88 vs 0.75) when compared to LR.
Zhou et al assessed the association of 17 proteins with allograft rejection in a cohort of 47 patients. The researchers used the LASSO variable selection method to select the significant proteins that predict the hazard of allograft loss. Conventional model selection techniques accept the strategy of best subset selection or some of the stepwise variants. Though, such a strategy is computationally unreasonable when the number of predictors is large. As demonstrated, the subset selection method may be numerically unstable, thus the developing model may suffer from poor prediction accuracy. As one of the most popular variable selection methods, LASSO is able to overcome the computational hurdle of the subset selection approach. The authors deduced that KIM-1 and VEGF-R2 had individual significant positive associations with the hazard of renal failure.
In a study conducted to predict the future values of estimated glomerular filtration rate (eGFR) for kidney recipients, Rashidi Khazaee at al developed and validated an ANN-based model (multilayer perceptron network) using three static covariates of the recipients’ gender and the donors’ age and gender, as well as 11 dynamic covariates of the recipients including current age, time since transplant, serum creatinine, fasting blood sugar, weight, and blood pressures available at each visit. The development and validation datasets included 72.7% and 27.3% of the 25811 records from the historical visit data of 675 adult kidney recipients. The ANN-based model dynamically predicted a future eGFR value based on a number of fixed and time-dependent longitudinal data. The authors suggested that using such analytical tools may help in realizing the administration of personalized medicine in kidney transplantation.
In another study, Mark et al used an ensemble of methods including random survival forests constructed from conditional inference trees. The benefit of combining diverse models to predict kidney transplant survival is that different models may work better than others on different cohorts of the data. The dataset was provided by the United Network for Organ Sharing and consisted of recipients who had kidney transplant surgery in the United States from 1987 to 2014[36,37]. The authors used 73 variables of the 163199 observations available during the chosen 10-year time period and proposed that the model achieved a better performance than the estimated post-transplant survival model used in the kidney allocation system in the United States.
In a multicenter study, Raynaud et al analyzed 403497 eGFR measurements of 14132 patients using a number of different ML techniques and identified eight distinct eGFR trajectories with latent class mixed models. Using a validation cohort of 9992 individuals, the authors suggested that their results provided the base for a trajectory-based assessment of kidney transplant patients for risk stratification and monitoring.
In a critical paper, Bae et al examined whether ML techniques are superior to conventional regression analysis. Studying the records of 133431 adult deceased donor kidney transplant recipients from the national registry data, the authors randomly selected 70% of the transplant centers for training and 30% for validation. They used different ML procedures (gradient boosting and RF) and regression analysis, with the aim of predicting DGF, 1-year acute rejection, death-censored graft failure, all-cause graft failure, and death in the training set. After comparing the performances of different models in the validation set, the authors asserted that ML does not outperform the conventional regression-based approaches in predicting various kidney transplant outcomes.
Optimizing the dose of immunosuppression
McMichael et al developed an intelligent dosing system for optimizing FK 506 therapy, and suggested that the computerized dosing algorithm for FK 506 is as an “expert system” using stochastic open loop control theory. They developed an AI dosing system (IDS) that would predict the drug dosages and levels. This IDS was programmed with hundreds of dosing histories, i.e., previous dose, previous level, current dose, and current level. The system was then used as a model to develop an equation that relates the current FK 506 dose and level with the desired dose and level. The IDS calculates the FK 506 dose required to achieve the target level. A prospective validation study shown that the model was 95% accurate in describing the relationship between FK 506 dosage and FK 506 plasma level, and that there were no biases in the dosing predictions.
Camps-Valls et al used neural networks for personalizing the dosage of cyclosporine A (CyA) in patients who had undergone kidney transplantation. The researchers used three kinds of networks [multilayer perceptron, finite impulse response (FIR) network, and the Elman recurrent network] while the formation of neural-network ensembles was used in a scheme of two chained models where the blood concentration predicted by the first model constituted an input to the dosage prediction model. After using 364 samples from 22 patients for training and 217 samples from 10 patients for testing, the authors decided that the best model was an ensemble of FIR and the Elman network. This model yielded an r value of 0.977 in the validation set. The authors also suggested that neural models have proven to be well suited to this problem not only because of the accuracy of their estimations but also because of their precision and robustness.
In Gören et al’s study, 654 CyA measurements and 20 input parameters from 138 patients were used to train (473 samples) and validate (181 samples) an adaptive-network-based fuzzy inference system. The model aimed at predicting CyA concentration based on 20 input parameters which included concurrent use of drugs, blood levels, sampling time, age, gender, and dosing intervals. The authors measured the performance of the developed model using root-mean square error, which was calculated as 0.057 for the validation set. In conclusion, the researchers suggested that their model could effectively assist physicians in choosing the best therapeutic drug dose in the clinical setting.
In two consecutive papers, Seeling at al described the development of a computer-aided decision system for planning tacrolimus therapy and then the integration of this system to the hospital information system. The authors used data from 492 patients and 13053 examinations, and created a classification model (conditional inference trees) using patient profiles, associated distributions, and intervals of medication adaption (decrease, increase, or maintain). The theoretical model resulted in 16 classes of patients and associated distributions, which were then translated to a medical logic module. Eventually, a method for determining semi-automated immunosuppressive therapy was created to guide nephrologists.
In their study where they used data from 1045 renal transplant patients, Tang et al utilized 80% of the randomly selected data to develop a dose prediction algorithm, and employed 20% of the data for validation. Multiple linear regression, ANN, regression tree (RT), multivariate adaptive regression splines, boosted RT, support vector regression, RF regression, lasso regression, and Bayesian additive RT were applied, and their performances were compared in this work. Among all the ML models, RT performed best in both the derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. The authors suggested that the ML models used to predict the tacrolimus dose may facilitate the administration of personalized medicine.
In Thishya et al’s study, the ANN and LR models were used to predict the bioavailability of tacrolimus and the risk of post-transplant diabetes based on the ABCB1 and CYP3A5 genetic polymorphism status. Besides polymorphism, the authors used the age, gender, BMI, and creatinine data from 129 patients for the input layer of their ANN and concluded that the ANN and multifactor dimensionality reduction analysis models explored both the individual and synergistic effects of variables in modulating the bioavailability of tacrolimus and risk for post-transplant diabetes.
Diagnosis of rejection
Hummel et al examined 145 patients who had kidney biopsy for the differential diagnosis of nephrotoxicity and acute cellular rejection using 18 different clinical and laboratory values for the input parameters, including tacrolimus dose, serum creatinine, and histocompatibility, to train the ANN. The classification results were considered significant by the experts who evaluated the classifiers. However, the researchers asserted that higher rates of sensitivity would be required to apply the classifier in clinical practice. In a separate paper, the same group of authors used the same database to examine the performance of different AI techniques to screen the need for biopsy among patients suspected of having nephrotoxicity or acute cellular rejection during the first year after transplantation. They used the ANN, SVM, and Bayesian interference (BI) to indicate if the clinical course of the event suggested the need for biopsy. The technique that showed the best sensitivity value as an indicator for biopsy was the SVM with an AUC of 0.79. The authors suggested that this technique could be used in clinical practice.
In Metzger et al’s study, SVM-based classification was used for resection and non-rejection. The researchers examined 103 patients (39 for training and 64 for validation) with a kidney biopsy and used CE-MS-based urinary proteome analysis for the data feed. The application of the rejection model to the validation set resulted in an AUC value of 0.91. In total, 16 out of the 18 subclinical rejections and all 10 clinical rejections (BANFF grades Ia/Ib) and 28 of the 36 controls without rejection were correctly classified.
Pineda et al developed an integrative computational approach leveraging donor/recipient (D/R) exome sequencing and gene expression to predict the clinical post-transplant outcome. The authors made a statistical analysis of 28 D/R kidney transplant pairs with biopsy-proven clinical outcomes with rejection, identifying a significantly higher number of mismatched non-HLA variants in antibody mediated rejection (AMR). They also identified 123 variants associated mainly with the risk of AMR and applied an ML technique to circumvent the issue of statistical power. Eventually, they found a subset of 65 variants using RF that predicted post-transplant AMR with a very low error rate.
In another study, the same group of authors evaluated 37 biopsy-paired peripheral blood samples from a cohort with stable kidney function with AMR and TCMR by RNA sequencing. The authors used ML tools to identify the gene signatures associated with rejection and found that 102 genes (63 coding genes and 39 noncoding genes) associated with AMR (54 upregulated), TCMR (23 upregulated), and stable kidney function (25 upregulated) perfectly clustered with each rejection phenotype and highly correlated with main histologic lesions (P = 0.91). Their analysis identified a critical gene signature in peripheral blood samples from kidney transplant patients who underwent AMR, and this signature was sufficient to differentiate them from patients with TCMR and immunologically quiescent kidney allografts.
Wittenbrink et al used a pretransplant HLA antigen bead assay data set to predict the risk of post-transplant ACR risk. Employing an SVM-based algorithm to process and analyze the HLA data, the model achieved the prediction of 38 graft recipients who experienced ACR with an accuracy of 82.7%. The authors reported that this was one of the highest prediction accuracy rates in the literature for pre-transplant risk assessment of ACR.
Prediction of early graft function
Shoskes et al used retrospective data from 100 cadaveric transplants to train an ANN with the aim of predicting DGF. For input, the authors used donor and recipient characteristics and then validated the model in 20 prospective cadaveric transplants. In the validation cohort, the ANN was able to predict DGF with an 80% accuracy. The authors suggested that the use of such a model could help improve donor/recipient selection and perioperative immunosuppression and reduce overall costs.
In Brier et al’s study, the researchers used an ANN and LR to predict DGF. In the examination of 304 cadaveric kidney transplantations, the researchers used data from 198 patients for training and 106 patients for validation. The results of the study showed that LR analysis was more sensitive in predicting ‘no DGF’ (91 vs 70%), while the ANN predicted ‘DGF’ with a higher sensitivity (56% vs 37%). The neural network was 63.5% sensitive and 64.8% specific. In conclusion, the authors deduced that ANN may be used for prediction of DGF in cadaveric renal transplants.
Santori et al assessed the efficiency of a neural network model to forecast a delayed decrease of serum creatinine in pediatric kidney recipients. In this study, the neural network was constructed with a training set of 107 pediatric kidney recipients, using 20 input variables. The model was validated in a second set of 41 patients. The overall accuracies of the neural network for the training set, the validation set, and the whole patient cohort were 89.1%, 76.92%, and 87.14% respectively. The developed ANN model had a higher sensitivity compared to LR analysis. The authors inferred that the neural network model could be used to predict a delayed decrease in serum creatinine among pediatric kidney recipients.
In another study, Decruyenaere et al constructed eight different ML methods to predict DGF and compared them to LR by using the data from 475 cadaveric kidney transplantations. Besides LR, the authors employed the following methods to construct the prediction models: LDA, quadratic discriminant analysis, and SVMs using linear, radial basis function and polynomial kernels, DT, RF, and stochastic gradient boosting. The performance of the models was assessed by computing sensitivity, positive predictive value, and AUROC after a 10-fold-stratified cross-validation. The authors found that the linear SVM had the highest discriminative capacity (AUROC: 84.3%), outperforming each of other methods, except for the radial SVM, polynomial SVM, and LDA. However, it was the only method superior to LR. Eventually, the authors asserted that the linear SVM was the most appropriate ML method to predict DGF.
In Costa et al’s evaluation of the impact of donor maintenance-related (arterial blood gas pH, serum sodium, blood glucose, urine output, mean arterial pressure, vasopressors use, and reversed cardiac arrest) variables on the development of DGF, data from 443 cadaveric donors ML methods that included DT, neural network, and SVM to locate donor maintenance-related parameters that were predictive of DGF were used. However, according to the multivariable LR analysis, the donor maintenance-related variables did not have any impact on DGF occurrence.
In a large scale study, Kawakita et al aimed to build personalized prognostic models based on ML methods to predict DGF. Using the data obtained from the United Network for Organ Sharing/Organ Procurement and Transplantation Network, their development set included a total of 55044 patients and the validation set included 6176 patients. Of the selected 26 predictors, 13 were donor-related, eight were recipient-related, and five were transplant-related. The authors used a development dataset with the selected features to train five ML algorithms: LR, elastic net, RF, extreme gradient boosting (XGB), and ANN. For performance comparison, a baseline model based on LR was developed. After training the ML algorithms, the authors assessed each model for three performance measures: Discrimination, calibration, and clinical utility using different metrics. All of the algorithms trained with the new predictors performed better or equally well in these characteristics compared to the baseline model, especially the ANN and XGB. The XGB is an ensemble learning method, which assembles DT as its building blocks to build a strong learner that is able to learn the nonlinear relationships between the predictors and the outcome. The authors suggested that ML was a valid alternative approach for the prediction and identification of the predictors of DGF, adding an important piece of evidence to support the use of ML in driving medical progressions.
In addition to the above-mentioned areas, AI techniques are used in kidney transplantation for different purposes. We located different articles in the following topics: Assessment of risk for various complications such as cardiovascular risk, pneumonia[59,60], and CMV infection, prediction of changes in lipid parameters, prediction of HLA response[63-65], and assessment of the risk of kidney transplantation during the coronavirus disease 2019 pandemic.