Because of the high stakes nature of this process, both scientifically and monetarily, researchers really need to scrutinize the source data that support these efforts and assess their fitness for use. Examples of the scrutiny often come from the analysis angle, which you all are aware of. This includes understanding completeness, bias, and potential confounding. In the biopharma world where time-based methods, such as survival analysis, really dominate, there's an emphasis on really examining endpoints such as survival information in a quantitative and robust way. This ultimately really means focusing on the development of a more complete mortality variable in the real-world evidence framework.
Our analyses have shown that structured mortality data in the EHR can exhibit sensitivity levels as low as 65% compared to our reference data set. Assuming that our EHR data captures as many true positives as possible and also captures no false negatives, this means that 35% of actual deaths are not recorded in the structured EHR field, and this is the best-case scenario.
So how does mortality missingness actually impact potential real-world data use cases? To help illustrate why this matters, let's consider a few concrete examples. A particularly important case for the mortality variable, and really any variable of interest, is going to be an experimental design. Specifically, it's important to understand how different levels of mortality can capture both differences in the sample size requirements and inherently connected effect size outcomes of your analysis.
When we perform a randomized controlled trial, often we use outcomes from real-world evidence and observational studies to inform a targeted effect size of what we would consider a promising treatment, for instance. In this figure, we represent example where 25% capture of mortality in an observational study would lead researchers to infer a greater sample size necessary to achieve an effect size of 1.5, for instance, as compared to a 100% data capture that they would likely find in a clinical trial. In fact, we see it's not even possible to achieve effect sizes much smaller than one here with low completeness data.
Another use case of real-world data comes from applying analyses to compare the effects, such as those of one treatment versus another. Imagine that we have increasing completeness along the Y axis here, and the dotted, red line is the true hazard ratio comparing a referent treatment and the treatment of interest. In certain cases, where mortality is missing, not at random, low completeness can bias our estimates away from the truth, leading to incorrect inferences and also result in greater relative uncertainty. We see that this confidence interval and this point estimate is very, very far from our initial red, dotted line.
In cases where data is missing at random, we may be able to accurately capture the true outcome within the 95% confidence interval of our point estimate, though that interval is going to be wider. The ideal completeness situation is in the bar at the very top, which is what we at Flatiron are really trying to achieve with our mortality capture so that we can accurately estimate both the point estimate and narrowly estimate the 95% confidence interval, given a high completeness measurement of our data.
Another potential use case for real-world evidence is to support survival extrapolation of clinical trials further into the future. If we're interested, for instance, in a clinical trial, which only followed patients for three years, but we wanted to investigate five-year survival outcomes, we could use real-world evidence to support this. However, using incomplete mortality data in these types of analyses can result in a biased overestimate of survival probability over time. This is a specific finding that we've identified in our validation analyses. We have slides that will highlight the details of these outcomes later on in the deck.
Finally, missing mortality data can also lead to incorrect inferences for clinical quality measurements. In this figure, we show that an equivalent number of deaths in site A and site B as referenced by the number of axes that you see. A gold standard reference on the right shows us where the true deaths are, and the transparent squares show us where death is not captured. While we know the ground truth that site A and site B are actually equivalent in terms of the number of deaths, if site B more accurately captures mortality data, we might improperly infer that site B is providing lower quality of care, where people are passing away at higher rates relative to site A, even if both sites actually have the same underlying death rate.
Before I go any further, I want to stop for a quick poll question, which should be launching on your screen momentarily. You can select one or multiple answers, and it's going to be completely anonymous. Specifically, we're wondering out of the use cases that we just highlighted, which one is most relevant to you. Your answers here will really help us guide and focus and create better material to support your work. We really appreciate your participation.
15 more seconds and then we'll close the poll. Okay. We're going to close the poll now and share the results. Okay. It seems like most of you have selected comparative effectiveness and survival extrapolation as being of main interest. This is really exciting for us to see. We believe our approach to mortality will really help you address all the issues related to these questions that you see in your daily work, or at least some of them, we hope. We're definitely going to touch on some of these points moving forward, so please stay on the line and we can talk about it more.
How are we thinking about mortality at Flatiron? In clinical research, we allocate an enormous amount of effort towards collecting death data. In real-world data sets, this information is inherently less complete, because there isn't the same kind of intentional perspective data capture that you get in a controlled environment like in clinical trials. The two most common sources of death data are from the Social Security Death Index and the National Death Index, but each has its own unique limitations.
The SSDI is timely and accessible, though broadly incomplete. In fact, sensitivity of SSDI mortality data compared to the gold standard has been decreasing over the past 10 years. We believe this is largely due to changes in reporting requirements by the federal government, where states are no longer required to report death data, thereby resulting in decreasing completeness over time. The NDI data, on the other hand, is often considered the US gold standard for mortality capture. However, there are substantial challenges associated with poor recency, where data can be delayed by nearly two years, along with pretty substantial use restrictions and limitations, which result in NDI as a death data source not being particularly accessible.
Given this broad range of use cases and importance of having a complete mortality variable and real-world evidence, a solution that we have identified as promising and really exciting for increasing the completeness of the Flatiron mortality variable is by considering death from a set of other sources external to the EHR. This was particularly exciting for me, personally, since in the health field, when we think about real-world data, we automatically mentally go to EHR. But the real-world, as we know it, really extends far beyond that. What we at Flatiron have really done is harness the power and the breadth of evidence across multiple data sources, which includes obituary data and data from the Social Security Death Index. When we combine all of these together, the result is the formation of a composite mortality variable.
The Flatiron approach moving forward has been to really scrutinize the composite mortality variable and develop it further while also maintaining versioning along the way. Originally with version one, we use structured EHR date of death and a preliminary linking algorithm to obituary data. In the new version, which we refer to as mortality V2, for those who are familiar with our data, we apply a set of business rules to combine structured EHR data and unstructured EHR documents, such as clinical notes, along with obituary data and SSDI all into one composite variable. The result is a substantially more complete data capture, which we perform continuous validation against over time as new data sources and approaches are identified. Importantly, we really want to know the details of the validation methods. The results of this composite variable are outlined in a paper published in Health Services Research by Curtis et al. We're happy to share that with you all after this webinar.
Let's dive into the details here. Our linkage algorithm is optimized based on comparative analysis against the gold-standard NDI data that we mentioned previously. We generate a consensus date of death across three structured EHR data sources, with the following hierarchy where there are multiple dates of death. If any three or two of the dates are in agreement, we report that associated date. If none of the dates are in agreement, we employ the following ranking in order of availability. We start with SSDI, then we go to obituary data, and finally, we use EHR data.
Lastly, but maybe also most importantly, we layer in abstracted mortality from our unstructured EHR data, such as clinical notes. What this means is that when we have a date of death known to the exact day from the unstructured to EHR sources, this abstracted date of death takes precedence over any structured date of death that we discussed previously. What does this all mean from a quantitative perspective? I want to hand this over to my colleague, Cherie, who's going to present the details of our mortality validation analysis, along with the really exciting results.
Qianyi (Cherie) Zhang: Thank you, Sarah, for talking us through some really interesting use cases and overview of our mortality variable. Hello, everyone. I am Cherie. I also work on the Quantitative Sciences team here at Flatiron. I'm looking forward to discussing this exciting project for the validation analysis of our composite real-world mortality endpoint. We believe this work is really important, because in order to ensure we're getting high-quality results through real-world data research, we must understand the quality of the data. Before we move forward, I want to add a note that some of the results we'll be sharing today are confidential and we'll be presenting in an upcoming major conference.
Okay. Let's get started. First of all, I want to talk about the background and the scope of this analysis. As mentioned, we have benchmarked our mortality data to the gold-standard NDI as an ongoing and continuous effort. Previously, we benchmarked our mortality data for four cancer types using data through 2015 and published a paper on it, as Sarah has mentioned. In this original analysis, we included patients treated in community sites only and conducted a couple of stratified analyses, as well as assess the impact of sensitivity on overall survival.
At the end of 2019, by using the most recent and available NDI data through 2017, we refreshed and expanded the analysis from 4 to 18 different cancer types. In our broad cohort for the patients in Flatiron enhanced data marts and the clinico-genomic cohort as a smaller subset of Flatiron patients who underwent FMI genomic sequencing. In this time around, we have expanded the analysis to also include our patients treated in academic sites. We also incorporated more extensive stratifications to provide further insights across different subpopulations. Additionally, we assessed the impact of data recency on validity metrics.
What exactly do we compare? In here, we show the three major validity metrics we have assessed against the gold-standard NDI, which are sensitivity, specificity and accuracy. The sensitivity, also known as the true positive rate, it is the percent of death in NDI that were also correctly recorded in the Flatiron data. The specificity, it is the percent of individuals without a death date in NDI and who were also not deceased in Flatiron data. The accuracy is the percent of Flatiron date of death that matched with the NDI date within a 15-day window.
Let's look at some results next. Overall, when benchmarking against the NDI, we observe high sensitivity of greater than 84%, even higher specificity greater than 94%, and very high date accuracy with a 15-day window greater than 95% across all 18 cancer types. Let's take a closer look into the sensitivity. In this part, we show the sensitivity from the original analysis for our composite mortality variable across four cancer types, ranging from 86% to 90%.
Next, let's look at some new results from the refresh analysis for 18 different cancer types. First of all, you will notice that the sensitivities for the four cancer types, as in the original analysis, are very consistent in this refresh. In general, we see the sensitivity of greater than 84% for all cancer types. We look forward to presenting this work as the poster at an upcoming scientific conference. As we mentioned earlier, besides the composite mortality variable, we have also assessed the validity metrics of the individual data sources. Let's take a look at them in the next slide.
As you can see in this table, we show the sensitivity not only by the composite mortality variable, but also by individual data source, which is structured EHR, obituary data, and SSDI data. Specifically, the sensitivity of either structured EHR or obituary data is moderate, with a similar range roughly from 54% to 71% by cancer types. However, as you can see in the last column, the sensitivity of SSDI-only data is very poor, ranging from 18% to 32%. This is very interesting to see since these results reiterate how much improvement in sensitivity was achieved in our composite mortality variable.
As we mentioned earlier, we have also assessed the validity metrics by certain strata. As you may know, Flatiron data sets include roughly 80% of patients treated in community sites and 20% treated in the academic setting. As in this table, we show the sensitivity of the Flatiron composite mortality variable stratified by practice type. We found the results are very similar for patients treated in community versus the academic across all cancer types. Besides these results, I wanted to show one additional result from an analysis on a sub-cohort in the next slide for our clinico-genomic database.
As compared to the broad cohort for patients in our enhanced data marts, the CGDB is considered as a smaller sub-cohort of Flatiron patients who underwent FMI genomic sequencing. As an additional note, many of our broad cohorts patients were advanced or metastatic only. The CGDB might also include patients with earlier stage as all-comer cohort. In order to make the results comparable, we include the patients from the same stage, such as advanced or metastatic, in these two types of cohorts for this analysis and the comparison. We found the validity metrics for CGDB were similar to those corresponding broad cohorts.
Besides the disease-specific cohort, and as shown in the last row, we also calculated the sensitivity of CG pan-tumor cohort, which include patients who are diagnosed with any other cancer that were not among the 18 specific cancer types and its sensitivity is also high as 87%. We have looked at a lot of validity metrics so far. But how does this result impact the downstream analysis, such as overall survival? Let's take a closer look.
In order to understand the differences in each of our death sources and its impact on OS, we generated the Kaplan-Meier curves using each individual and combinations of death data sources, as well as gold-standard NDI data. This plot illustrated the advanced non-small cell lung cancer patients as an example. We index to the relevant cohort entry date, such as advanced or metastatic diagnosis date, using the latest of last structured visit or last abstracted oral medication end date as censor date, and the death date as event date. As you can see in this plot, from the top to the bottom, sequential addition of obituary data, SSDI data, and abstracted death on to the structured EHR resulted in OS curves progressively closer to those using NDI.
Let's take a look at the median OS estimate from these curves. The result in this table is reiterating that sequentially adding death sources to where the Flatiron composite mortality variable could lead to a median OS estimate progressively closer to the estimate by NDI data. This demonstrated that using a less sensitive mortality variable may risk overestimating survival, which underscores how critical it is to use a composite source approach to reach a high-quality mortality variable.
For the next slide, I want to touch upon some other important aspects of real-world data about recency and maturity. As you may know, in the real-world setting, once an event occurred, it might take us some time to capture that event and reflect it in a data, which is called the time lag. As we update and snapshot our Flatiron enhanced data mart by one month cadence, the question naturally comes up if and how the sensitivity of mortality data increased over snapshots, as we allow longer time for data capture and follow up.
We conducted this additional time lag analysis to answer this question. First, I will discuss our approach. We identified a subset of patients from the meta-analysis who were included in multiple time snapshots, and then calculated the sensitivity of mortality data across these different snapshots. Here are the four snapshots that we chose with different length of time for mortality data capture as shown on the slide. Let's look at some results next.
How does the sensitivity look like for the same patient across different time snapshots? As you can see, across all cancer types as indicated on the X axis, the sensitivity of the composite mortality variable remains high regardless of the length of mortality data capture by using different snapshots post-2017. It is typically only about 1% higher when using the most recent snapshots compared to the earliest. We did see this similar trend by individual mortality data sources as well. These results have led us to the next question, which is how long does it take us to actually capture the death once the death has occurred? In order to understand this question, refer those stratified patients by the time of their death according to NDI. Let's take a look in the next slide.
In this plot, we included patients from all cancer types as a bundled cohort so that we can show the time of their death on the X axis by every six-month time interval from 2011 to 2017. As you can see, for the earlier death events, the sensitivity of mortality variable is very stable over snapshots. But if we look at the very right of the figure for those more recent deaths during the second half of 2017, by using the January 2018 as the earliest available snapshot, we saw a sensitivity already as high as 83%. But if we allow an additional six-month for data capture by using July snapshot, we see 87% sensitivity with about 4% increase. If we continue to capture that, the sensitivity was still increased but with smaller change.
What did we learn from these results? Think about for those cohorts that may have very recent patients. For example, patients receiving a recently approved late-line therapy and, in this case, considering a six-month of potential follow-up to let us maximize the sensitivity and specificity of the mortality variable. I know we have shared a lot of results so far. As a recap, let's look at some summarized key takeaways together.
First of all, we're excited to see these positive results from our analysis that high validity metrics were observed across numerous cancer types. Secondly, by seeing the results by strata and sub-cohorts, such as CGDB, we want to emphasize the importance of reinsuring the completeness and quality of a mortality data, not only in broader populations but also the sub-cohorts relevant to a given study. Lastly, by assessing the validity metrics and OS impact by individual data source, we'll learn that not all real-world mortality data sources have equal quality. Understanding the validity metrics of a given source is a critical step towards the generation of reliable real-world evidence.
Additionally, I want to cover some limitations of this study as well. This study considered the NDI as the gold standard. Therefore, we rely on the assumption that NDI is 100% complete and accurate. Besides, given the poor recency of NDI data, which was only available through 2017 when this analysis started, we were not able to assess the validity for the more recent deaths after 2017. We're going to do a second and final poll question, which should be launching on your screen momentarily. The question is, what characteristic do you value the most when using real-world data for survival analysis? Please select one answer. Once again, this will be anonymous.
Okay. 15 more seconds and then we will close the poll. We will now close the poll and share the results. Looks like the top two answers here are accuracy and completeness. I'm glad that we have covered all these two aspects during this webinar. Thank you all so much for participating. This is really good to know. Before we wrap up here, I just wanted to highlight a few analytic considerations when working with Flatiron mortality variable.
First of all, we observed some differences in sensitivity by race and region. Specifically, patients from the West or Puerto Rico has a relatively lower sensitivity as compared to the rest of the patients. Additionally, we saw a lower sensitivity in Asian and Hispanic patients as compared to the other races. Given these findings, we recommend conducting sensitivity analysis by excluding or stratifying by this characteristic when appropriate. For the second point here, as we have discussed in a time lag analysis, for a cohort that includes more recent patients, we recommend considering to allow six-month potential follow-up time to maximize sensitivity and specificity of mortality data.
Lastly, a final point that's more of a consideration for EHR data. We might see a small number of patients with structure activity after their date of death. This is typically an issue with the deaths related to the structured activity, and we recommend assuming the patient as dead if there is an available date of death. We have incorporated all of these considerations into our latest articles on the Flatiron Knowledge Center that Nina will provide more details shortly. I will hand it over back to Nina from here. Thank you all so much for listening.
Nina Shak: Thanks, Cherie. We'll get to Q&A in just a few minutes, but before we do, I wanted to share a couple of quick things. We spoke to validation efforts throughout this presentation, and part of Flatiron's rationale for investing in validation is to ensure that we can meet partner needs along every step of the research process. Specifically, there's a few ways we can be of help to your team. As Cherie reviewed, Flatiron has invested in analytical guidance for common applications of mortality information. These can be found on Flatiron's Knowledge Center, which is also where some standard publication language can be pulled with appropriate citations.
We also have both a Publications and a Regulatory team who can support researchers and ensuring that use of this variable is appropriately described and also contextualized by the sub-cohort of interest. Should you receive questions about our validation efforts, those can be cited through our 2018 Curtis publication, or Flatiron can also provide cohort-specific detail to journals and regulators as requested.
To provide a bit more color on the Knowledge Center, this is Flatiron's repository of information containing documentation and guidance on Flatiron data for our life sciences partners. You can find various things like your organization subscriptions, monthly data delivery notes and data dictionaries, information about specific data products like our enhanced data marts, CGDBs, or Spotlight projects, as well as tutorials and methodology guidance.
Following this webinar, you will receive an email that will include relevant materials related to this presentation. Additional collateral will also be posted to the Flatiron Knowledge Center for active Flatiron customers. We also have a number of upcoming webinars involving market access, as well as cohort selection, which we will also share via email. Keep your eyes out for those.
Now for the Q&A portion of our webinar. Thanks for all the questions you've been submitting. We've received numerous so far. As I mentioned at the beginning of the webinar, we have three more members of the Flatiron team on the line now for the Q&A portion, who also helped with the validation analysis discussed just a few minutes ago. I have Dr. Nate Nussbaum, a medical oncologist and a member of our Research Clinicians team; Dr. Christina Parrinello, another member of our Quantitative Sciences team; and Kellie Ciofalo, Head of Strategic Operations at Flatiron and also project manager for Flatiron's mortality variable.
For our first question, it was asked, how frequently will Flatiron complete these validation analysis efforts? Again, how frequently will Flatiron complete these validation analysis efforts? I'll hand this one off to Kellie.
Kellie Ciofalo: Thanks, Nina. Yeah, we're planning on completing these efforts every three years. We don't have reason to believe validity will change substantially over time, unless our capture rates or access to any of our data sources changes.
Nina Shak: Great. Thank you, Kellie. For the next question, it was asked, why are all dates of death not available in electronic health record? Again, why are all dates of death not available in electronic health record? I'll hand this one off to Nate.
Nate Nussbaum: Thanks, Nina. There can be several contributors for this one. First of all, families may not remember to notify oncologist that a loved one has passed away. It's possible that even if an oncologist does find out, the oncologist or the practice may not document the info since it's not always directly relevant to their clinical workflows. It's also possible that patients can transfer to another clinic or transfer to hospice which decreases the visibility and the frequency of the contact between the practice and the patient's family. It's for all of these reasons that when we create the mortality variable, we supplement the EHR data with other data, so the SSDI data and obituary data. Thanks.
Nina Shak: Thanks, Nate. Next question. It was noted that I see visit and other structured activity in the Flatiron data set after date of death for some patients. Why is this? Again, I see visit and other structured activity in the Flatiron data set after date of death for some patients. Why is this? Sarah will respond to this one.
Sarah Cherng: We do expect to see some patients with structured activities such as uncanceled orders after their month of death in the real world, and patients sometimes pass away with every intention of seeing their oncologist again for administrations, tests, routine or recurring office visits, for example. For these reasons, we do see a small number of patients with activity after their date of death. This is less than 1% of patients in NSCLC who have visits, for instance, greater than 30 days after the date of death.
This can be problematic to address in analysis, and our mortality team has really looked into the situation in the past via chart review. We find that the date of death is actually usually correct with the post-death activity date being incorrect. Because of this, and the high specificity and sensitivity of our mortality variable, we typically consider patients with this discrepancy to be deceased, and we'll use the date of death as reported in our analysis.
But if you're interested, you can look into conducting a sensitivity analysis to better understand the potential impact of all death dates in a scenario where the date of death precedes the last activity date. I just want to note that it's also really important to consider any imputing decisions that your team has made. For example, when imputing the 15th of the month for a date of death, it's possible to see structured activity in the second half of the month, which may not be erroneous for instance.
Nina Shak: Great. Thanks, Sarah. Next question, will you be adding the new mortality variable to the Flatiron structured EHR data that we currently get? Again, will you be adding the new mortality variables to the Flatiron structured EHR that we currently get? I can field this one. That's a great question. Our mortality composite variable is already delivered in the vast majority of our offerings, including enhanced data marts, also referred to as core registries or EDMs for short, clinico-genomic database subscription, so those are the CGDBs with Foundation Medicine, and also Spotlight projects. But definitely follow up in the chat box for any questions specific to your organization as these are confidential questions and we can get back to you offline on any specifics.
The next question, is it possible to include both SSDI and NDI to complement the EHR data? Again, is it possible to include both SSDI and NDI to complement the EHR data? I'll hand this one off to Christina.
Christina Parrinello: Thanks, Nina. It is true that the NDI is very complete. However, NDI data is both not available in a timely fashion and actually has usage restrictions that prohibit its inclusion in our commercial data sets. For this reason, Flatiron uses the NDI data to validate our endpoint, as we showed, that combines the Social Security Death Index data along with the obituary data in combination with our EHR information.
Nina Shak: Thanks, Christina. The next question, does Flatiron or any of the sites in the network follow up via phone or email with relatives to confirm if a patient has died? Again, does Flatiron or any of the sites in network follow up via phone or email with relative to confirm if a patient has died? Sarah will respond to this one.
Sarah Cherng: Yeah. Death reports can come in many forms. In some cases where communication is a part of routine practice, clinical teams may follow up about a patient status and document learnings in the charts. In other cases, family members of a loved one or hospice center may communicate the death to the physician and the team. These are just some examples that vary by site and patient situation. Our abstraction approach does take all of the relevant forms of communication into account, but we really want to note that Flatiron does not follow up directly with patient families.
Nina Shak: Thanks, Sara. Next question, when you combine different sources to get mortality or a death event, can you identify the cause of death? This could be disease directly related or not related to the disease at all. Again, when you combine different sources to get mortality or death event, can you identify a cause of death? I'll hand this one off to Kellie.
Kellie Ciofalo: Yeah. Cause of death is not delivered with our composite variable, but it can be abstracted for specific use cases. We recommend that you reach out to your LSP contact or to firstname.lastname@example.org, that's email@example.com for any further discussion on this.
Nina Shak: Thanks, Kellie. Next, is there an explanation for why early breast cancer has much larger confidence intervals when validated against the NDI for the sensitivity analysis? Again, is there an explanation why early breast cancer has much longer confidence intervals when validated in the sensitivity analysis? I'll hand this one off to Christina.
Christina Parrinello: Sure. Our early breast cancer data mart is relatively small compared to our other tumor types. Additionally, early breast cancer patients typically have better prognosis than other cancer patients. Therefore, there were a very small number of death events included in this analysis. This resulted in lower precision or worse precision for the validity metrics for early breast cancer. Thanks.
Nina Shak: Thank you, Christina. Next question, how do you differentiate between a patient who died versus a patient who migrated to another network or simply stopped receiving care? Again, how do you differentiate between a patient who died versus a patient who migrated to another network or simply stopped receiving care? Christina will also answer this one.
Christina Parrinello: Sure. This is another great question. All of our available data sources document death events specifically. In other words, our mortality variable does not actually infer death events using any other proxy. That said, in cases where the period of survival is observed for each patient, so in other words where the date of death is known, the observed data may be used to estimate the population level, median overall survival, and the survival curve. However, as you can imagine, with most real-world longitudinal analyses, objective evidence of death may not be available for some patients for a whole host of reasons. Most common among them might be lost to follow-up. For example, transferring a treatment to a new clinic. In these cases, these patients are most appropriately treated as censored. Thanks.
Nina Shak: Thanks, Christina. Just one moment, I think we're about to wrap up as we have answered most of them. We will be finishing a little bit earlier today. If there are any questions that we didn't get to live at this time, we will definitely follow up via email. If you have any questions about the content presented or you want more information on Flatiron's mortality variable, please don't hesitate to reach out to your life sciences contact or at firstname.lastname@example.org. Again, that's email@example.com. Upon closing out of this webinar, you will be prompted to take a short survey to help us improve for future webinars. We would greatly appreciate your time if you can fit this in. Have a great rest of your day. Everyone, stay healthy and safe.