This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Author contributions: Stapff MP developed the scientific concept, literature search, study design, applied the data querying, result interpretation, scientific discussion, and prepared the manuscript.
Institutional review board statement: As a federated network TriNetX received a waiver from Western IRB since only aggregated counts, statistical summaries of de-identified information, but no protected health information is received, and no study specific activities are performed in retrospective analyses.
Informed consent statement: This was an observational study based on analyses of anonymized electronic medical records describing real world treatment. No intervention or any study specific activity was done. Therefore, no informed consent was necessary and would even have been not feasible considering the anonymized and retrospective character of the analysis.
Conflict-of-interest statement: The author is employee of TriNetX Inc., the data network and analytics platform used for this publication. TriNetX as a company was not involved in the design of the study; the collection, analysis, and interpretation of data; writing the report; or the decision to submit the report for publication. The author does not declare conflicting interests (including but not limited to commercial, personal, political, intellectual, or religious interests).
STROBE statement: The author has read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Corresponding author to: Manfred Paul Stapff, MD, PhD, Chief Medical Officer, CMO, TriNetX Inc., 125 Cambridgepark Drive, Ste 500, Cambridge, MA 02140, United States. firstname.lastname@example.org
Received: August 22, 2018 Peer-review started: August 22, 2018 First decision: October 4, 2018 Revised: October 10, 2018 Accepted: November 15, 2018 Article in press: November 15, 2018 Published online: December 15, 2018
To evaluate the effect on cardiovascular outcomes of sodium-glucose co-transporter-2 (SGLT2) inhibitors in a real world setting by analyzing electronic medical records.
We used TriNetX, a global federated research network providing statistics on electronic health records (EHR). The analytics subset contained EHR from approximately 38 Million patients in 35 Health Care Organizations in the United States. The records of 46,909 patients who had taken SGLT2 inhibitors were compared to 189,120 patients with dipeptidyl peptidase (DPP) 4 inhibitors. We identified five potential confounding factors and built respective strata: elderly, hypertension, chronic kidney disease (CKD), and co-medication with either insulin or metformin. Cardiovascular events were counted as stroke (ICD10 code: I63) or myocardial infarction (ICD10: I21) occurring within three years after the first instance of the respective medication in the patients’ records.
Of the 46909 patients with SGLT2 inhibitors in their EHR, 1667 patients (3.6%) had an ICD code for stroke or for myocardial infarction within the first three years after the first instance of the medication. In the control group, there were 10680 events of 189120 patients (5.6%), which represents a risk ratio of 0.63 (95%CI: 0.60-0.66). The overall incidence of stroke or myocardial infarction in the strata with a potential confounding risk factor reached from 4.9% in patients taking metformin to 12.5% in the stratum with the highest risk (concomitant CKD). In all strata, the difference in risk of experiencing a cardiovascular event was similarly in favor of SGLT2 vs control, with Risk Ratio ranging from 0.62 to 0.81.
Real world data replicated the results from randomized clinical trials, confirmed the cardiovascular advantages of SGLT2 inhibitors, and showed its applicability to the US population.
Core tip: Cardiovascular advantages of sodium-glucose co-transporter-2 (SGLT2) inhibitors were shown in complex clinical trials or in countries with large registries. However, it was unclear whether these findings could be applied to routine medical practice in the US. This real world analysis from 46909 patients with SGLT2 inhibitors revealed a 0.63 (95%CI: 0.60-0.66) risk ratio of SGLT2 inhibitors compared to 189120 patients with dipeptidyl peptidase 4 inhibitors. This analysis of electronic health records could replicate the results of randomized clinical trials, which supports the usefulness of such real world studies (e.g., for long-term outcome or safety observations).
Citation: Stapff MP. Using real world data to assess cardiovascular outcomes of two antidiabetic treatment classes. World J Diabetes 2018; 9(12): 252-257
An estimated 30.3 million people of all ages (or 9.4% of the United States population) had diabetes in 2015. It is expected that the world prevalence of diabetes among adults will increase to 7.7%, or 439 million adults, by 2030. Between 2010 and 2030, there will be a 69% increase in the number of adults with diabetes in developing countries, and a 20% increase in developed countries.
While short-term treatment targets focus on the normalization of values for glucose and hemoglobin A1c, the long-term objective is to avoid late-stage complications of diabetes and end-organ damage. Up to 70% of patients with diabetes type II (T2DM) also have arterial hypertension and are thus exposed to an increased risk of experiencing a stroke or heart attack. It is therefore important that treatment paradigms for T2DM consider the long-term cardiovascular risk.
In 2015, the EMPA-REG OUTCOME trial found a significant mortality benefit of sodium-glucose co-transporter-2 (SGLT2) inhibitors vs placebo. Because the findings were unexpected, unprecedented and not linked to obvious mechanistic pathways, it was suggested that the results be replicated in future investigations. Recently, CVD-REAL Nordic, a multinational observational study, analyzed the cardiovascular mortality and morbidity in patients with T2DM following initiation SGLT2 inhibitors. CVD-REAL Nordic was an observational analysis of individual patient-level data from national registries in three Scandinavian countries, showing that SGLT2 inhibitor use was associated with reduced cardiovascular disease and cardiovascular mortality.
The objective of the following analysis was to support or contradict the results of EMPA-REG OUTCOME and CVD-REAL Nordic by using electronic medical records (EMR) from a predominately United States-based research network, thus evaluating the representativity of these results outside the experimental setting of a randomized clinical trial and beyond a European population, respectively.
MATERIALS AND METHODS
We used TriNetX, a global federated research network providing access to statistics on EMR (diagnoses, procedures, medications, laboratory values, genomic information). The analytics subset allowed the analysis of approximately 38 million patients in 35 large Health Care Organizations predominately in the United States. As a federated network, TriNetX received a waiver from Western IRB, since only aggregated counts, statistical summaries of de-identified information, and no protected health information is received. In addition, no study-specific activities are performed in retrospective analyses. Details of the network have been described elsewhere[6-8]. All analyses were done in the TriNetX “Analytics” network using the browser-based real-time analytics features. At the time of the analysis in June 2018, we analyzed the EMR of 46909 patients in the network who had an instance of any SGLT2 inhibitor (empagliflozin, dapagliflozin or canagliflozin) any time within the past ten years in their electronic medical record. As a comparison group, we chose patients who had taken dipeptidyl peptidase (DPP) 4 inhibitors (linagliptin, alogliptin, sitagliptin or saxagliptin) during the same time, and found 189120 patients. Using a Bayesian statistical approach on demographics and pre-existing (baseline) comorbidities of the two groups, we identified five potential confounding factors and built strata with the following criteria: age ≥ 60 years, presence of hypertension [International Classification of Diseases (ICD)10 code I10], presence of CKD (ICD10 code N18), co-medication with insulin, and co-medication with metformin. Separately analyzing strata allowed us to address potential bias in the federated data model without direct access to the individual data sets on the patient level.
Cardiovascular events were counted by selecting any stroke (ICD10 code I63) or myocardial infarction (ICD10 code I21) occurring during a three-year observation period after the first instance of the above mentioned medications in the patients’ records.
The risks of experiencing an event in each stratum were calculated by dividing the number of patients with an event (numerator) by the total number of patients with the respective medication in each stratum (denominator). The risk ratios for SGLT2 inhibitors vs the comparison group were calculated by dividing the risk for each SGLT2 stratum by the risk in each corresponding DPP4 stratum.
Of the 46909 patients taking SGLT2 inhibitors, 1667 patients (3.6%) had an ICD code for stroke or myocardial infarction during their three-year observation period, compared to 10680 of 189120 (5.6%) in the control group (Table 1). This translates into a risk ratio of 0.63 without any correction for potential bias (P < 0.001; 95%CI: 0.60-0.66).
Table 1 Patient characteristics and results before correcting for potential confounding factors.
LDL cholesterol (mg/dL)
HDL cholesterol (mg/dL)
After index event
Total stroke (I63) or MI (I21)
n in group
Percent in group
RR SGLT2 vs control
SGLT2: Sodium-glucose co-transporter-2; RR: Risk ratio; SD: Standard deviation; CKD: Chronic kidney disease; LDL: Low density lipoprotein; HDL: High density lipoprotein; MI: Myocardial infarction.
SGLT2 inhibitors carry a contra-indication for renal insufficiency. Indeed, the percentage of patients with CKD was only 4% in the SGLT2 group, compared to 8% in the control group. While the groups were similar in gender distribution (53% and 52% male, respectively) and low density lipoprotein, as well as high density lipoprotein levels, the SGLT2 group was younger than the control group (mean age 59 vs 66) and had more patients with concomitant hypertension (45% vs 41%). There were also differences in the use of insulin (32% vs 19%) and metformin (52% vs 33%). To balance for these potential confounding factors, strata were built for age ≥ 60 years, CKD, hypertension, and antidiabetic co-medication (insulin and metformin). The overall incidence of stroke or myocardial infarction in each stratum reached from 4.9% to 12.5%. In all strata, the difference in the risk of experiencing a cardiovascular event in the SGLT2 group vs control was similarly in favor of SGLT2, with risk ratios ranging from 0.62 (co-medication insulin) to 0.81 (patients with CKD) (Table 2).
Table 2 Results from the patient subgroups (strata) with potential confounding factors.
Drug therapy of type II diabetes mellitus should both bring glucose and hemoglobin A1c values into an acceptable and stable range, and reduce the likelihood of end organ damage or cardiovascular events.
Several studies and meta-analyses have suggested a positive effect on cardiovascular outcomes by the SGLT2 inhibitor class[11,12]. EMPA-REG OUTCOME and CANVAS were randomized placebo controlled prospective trials that used empagliflozin and canagliflozin, respectively.
A recent observational cohort study observed protective effects of SGLT2 inhibitors compared to sulfonylureas by a database analysis. Another study, CVD-REAL Nordic, was the first large observational analysis performed in real world settings in three Scandinavian countries that evaluated the cardiovascular benefits of this class, which also showed that SGLT2 inhibitor use was associated with reduced cardiovascular disease and cardiovascular mortality compared with the use of other glucose-lowering drugs. Such real-world studies are less complicated and significantly less costly than traditional prospective randomized clinical outcomes trials. In addition, the reduced number of eligibility criteria ensures that the study results are representative and applicable to a much wider population. Recently, another study confirmed that real-world data analyses of patients receiving routine care provide findings similar to those found in a randomized clinical trial, and may even support (supplemental) regulatory applications. Real world evidence can sometimes complement or even replace randomized controlled trials, but prejudices and reservations so far have limited their acceptance.
Therefore, the underlying data sources must be reliable, and the methods used have to be defined in advance to avoid “data dredging” based on the findings. Furthermore, the data usually come from non-consented patients and therefore the highest standards of data privacy must be ensured.
The present study was undertaken to evaluate whether the results of the EMPA-REG OUTCOME and CVD-REAL Nordic studies can be replicated in a federated network of EHR, and if they can be applied to a predominantly United States American population. As controls, we chose DPP4 inhibitors that represent another homogeneous and relatively new non-metformin class. We found a significantly lower incidence of stroke or myocardial infarction in the SGLT2 group within the three-year observational period compared with the control group.
In a federated data network, individual data sets never leave the source (i.e., the data warehouse of a healthcare organization). Instead, the analyses are done based on aggregated statistical counts. At the time of this analysis, our platform limited the methods that could be applied to correct for potential confounding factors, such as pair matching or propensity score matching (PSM). While PSM is a popular method of preprocessing data for causal inference, it is controversial since it may accomplish the opposite of its intended goal, such as increasing imbalance or bias. In addition, the censoring by PSM that excludes certain patients from the analysis, reduces the sample size and the representation of a diverse patient population, thus re-introducing the criticism often applied to randomized clinical trials regarding their very restrictive eligibility criteria.
We therefore chose to build subgroups of the study population according to the presence of potentially confounding factors, and to test these strata individually. SGLT2 inhibitors have a contraindication for renal insufficiency and are a relatively new class of antidiabetics with less long-term experience than comparator classes, such as metformin or DPP4 inhibitors. One can therefore assume that the treatment decision by prescribing physicians may be driven by a patient’s renal function, patient age, and other potential risk factors. Indeed, we found a lower mean age in the SGLT2 group, similar to CVD-REAL Nordic before matching. Furthermore, the SGLT2 group had fewer patients with CKD than the comparison group. In prospective randomized clinical trials, such factors usually get balanced by randomization, which must be corrected for when a retrospective analysis is done. We therefore created five strata, based on age ≥ 60 years, hypertension, CKD, insulin therapy or metformin therapy, and tested the event rates individually in each of these subgroups. The fact that the overall highest event rate was found in the higher risk stratum (patients with CKD) provides internal validation for the selection of the strata.
All strata showed very similar hazard ratios for cardiovascular events (according to our definition using ICD10 codes for myocardial infarction or stroke), which were consistently in favor of the SGLT2 inhibitor group, i.e., between 0.62 and 0.81. This generally confirms the findings of the CVD-REAL Nordic study, where the risk ratio for cardiovascular mortality and for major cardiovascular events was in a similar range of 0.53 and 0.78, respectively.
Due to the nature of the design (retrospective, non-randomized) and data analysis (federated, aggregated strata), this study could be done very quickly, simplistically and with minimal cost, but may have several limitations. Non-randomized comparisons bear the risk that patients’ disease state influence the treatment decision and thus introduce imbalances. We limited balancing for confounders to five major factors and did not further correct for residual, potentially confounding factors like other co-morbidities, duration of diabetes, glucose or HBa1c values, concomitant medications or length of exposure to concomitant treatment. Our outcome criteria were simply the ICD10 codes for myocardial infarction or stroke, relying on correct coding at the source without differentiation between morbidity and mortality. Despite the fact that one specific compound numerically dominated in each group (SGLT2: canagliflozin 78%, DPP4: sitagliptin 69%), we consider the results as representative of a class but not robust enough for a comparison of two individual compounds.
Real world studies depend on the prescribing and documentation behavior of the data-providing institutions. We used EHR in structured form rather than claims data. This has the advantage of complete medical information coming from the respective Health Care Organization, but data may be lacking if a patient visits another institution. This especially applies to medication and prescription refills. While we defined an observational period of three years, we could not validate whether the patients actually stayed with their medication for the whole period, as we defined the treatment group based on one documentation of SGLT2 or DPP4 in their records. Insofar as a difference in compliance or persistence between the groups could introduce a potential imbalance, the approach would be similar to the intent-to treat principle, which is applied to randomized clinical trials.
Furthermore, differences in the completeness of medical records between comparison groups need to be taken into consideration as well. In searching for a potential documentation bias, we found similar data density in the SGLT2 cohort compared to control (Table 3).
Table 3 Data density in the two comparator cohorts.
Theoretically, one could assume that more events had been found in the control group simply because this patient cohort was better documented. In real world studies, consideration of different therapeutic settings and documentation completeness is important, e.g., when comparing oral vs injectable medication, or inpatient vs outpatient procedures. However, SGLT2 inhibitors and DPP4 inhibitors are both taken orally and prescribed in similar settings. In addition, our data overall found about 20% more events in the DPP4 group, but the density of facts per patient in the documentation of this group was only 6% higher. Therefore, a documentation bias as an explanation for the difference in CV events in this study is very unlikely.
In conclusion, this study was conducted by analyzing EHR of approximately 38 million patients from 35 healthcare organizations, mainly from the United States. This real world clinical setting allows the analysis of data from patients with a much broader cardiovascular risk profile than the highly selective population in randomized clinical trials. The federated structure of this network ensures the highest level of data privacy standards, but poses some restrictions on the possible analytics, such as matching by propensity scores. Despite these limitations: (1) this analysis could replicate the results from much more complex and costly studies on the same topic, which validates our methods and the quality of data in the network; (2) our analysis shows that the cardiovascular advantages of SGLT2 inhibitors found in the Scandinavian CVD-REAL Nordic study can be applied to the United States American population.
Therapy for diabetes mellitus intends to control blood glucose values, to prevent or delay diabetic complications such as chronic kidney disease or retinopathy, and to reduce the likelihood of cardiovascular events like myocardial infarction or stroke. Several randomized clinical trials and sophisticated European registries have suggested that sodium-glucose co-transporter-2 (SGLT2) inhibitors may have an advantage in preventing cardiovascular events.
Randomized clinical trials are conducted on highly selected patient populations and follow very artificial treatment protocols. This makes it sometimes questionable whether the results are representative and can be applied to routine medical practice.
To determine whether positive results from randomized clinical trials with SGLT2 inhibitors can be confirmed by real world data from actual routine medical practice in the United States.
A federated research network was used, allowing analyses of electronic medical records (EMR) from 38 million patients in 35 large Health Care Organizations predominately in the United States. Cardiovascular events occurring during a three-year observation period after start of a therapy with an SGLT2 inhibitor were counted and compared to a control group starting dipeptidyl peptidase 4 inhibitors. Comorbidity strata were created to address potential confounders.
In the overall cohort and in all comorbidity strata, the risk of experiencing a cardiovascular event was similarly in favor of SGLT2, with risk ratios ranging from 0.62 to 0.81.
The analysis of data from patients with a much broader cardiovascular risk profile than the selected population in randomized clinical trials could replicate the results of such trials. This validates the methods and quality of data in the network, and allows extrapolation of the trial results to the general patient population.
Sophisticated analyses of high quality EMR can complement costly, complex and lengthy randomized clinical trials, can assess their representativity for actual medical practice in the real world, and may even, in certain instances, be able to replace them.
Manuscript source: Unsolicited manuscript
Specialty type: Endocrinology and metabolism
Country of origin: United States
Peer-review report classification
Grade A (Excellent): A
Grade B (Very good): 0
Grade C (Good): C
Grade D (Fair): 0
Grade E (Poor): 0
P- Reviewer: Avtanski D, Senol MG S- Editor: Dou Y L- Editor: Filipodia E- Editor: Song H