Regulatory Perspective on Real-World Endpoints with Sean Khozin

Thank you for subscribing to the RWE newsletter.x

Sean Khozin, formerly of the FDA’s Oncology Center of Excellence, discusses the challenges as well as the progress in generating reliable endpoints from real-world datasets at the 2018 Flatiron Research Summit.

Of note, Dr. Khozin proposed a framework for contextualizing the many endpoints available to researchers working with real-world data. Rather than holding each endpoint to the same standard, he proposed thinking about them in three categories: “validated”, “reasonably likely”, and “candidate”. While many real-world endpoints fall into the “candidate” category, he acknowledges that endpoints in the middle category – “reasonably likely” (e.g. Response Rate) – may be suitable for use in accelerated approvals.

Transcript

Sean Khozin: Thank you, great to be back. And I'd like to talk about just an organizing framework for evaluating real-world endpoints. The topic is a very complex one, like all the other topics that have been discussed today. And a very nuanced one as well. And I believe it helps a lot to have a foundation based on prior experience, but also emerging regulatory concepts that can help us move forward, in terms of thinking about real-world endpoints and how to increase our level of confidence in these endpoints, in terms of their validity and precision.

So, I'm gonna step back and start from the basics, in terms of clinical trial endpoints. What are they? We all intuitively know this and there is a regulatory definition for a clinical trial endpoint. Which says that, endpoints are essentially measurements that capture outcomes of interest in a clinical trial. And these measurements can be laboratory measurements, tumor measurements, and a variety of different measurements that can be captured in the clinical setting. And when it comes to the different types of measurements or clinical endpoints that we typically use in clinical trials for regulatory decision making, generally speaking, there are two different types. There are direct measurements and indirect measurements.

Direct measurements are essentially clinical outcomes and these are measurements that directly measure an outcome that's clinically meaningful in terms of how the patient is feeling and functioning or living, in terms of survival. So, overall survival is a direct measurement. And when it comes to evaluating these endpoints in regulatory decision making, typically these endpoints are used for traditional approval decisions. And traditional approval is a new term for regular approval.

And there are also indirect measurements and we call those surrogates. And these are very important endpoints that have been used over the past couple of decades, increasingly in oncology clinical trials and what's interesting is that based on the latest comprehensive data that we have available to us between 2010 and 2012, nearly half of FDA approvals were based on indirect surrogate endpoints. So these endpoints are very important in clinical development and are used very widely. And when it comes to surrogate endpoints, there are a variety of different surrogates and each one is positioned along it's continuum of validation, from a candidate surrogate endpoint all the way to a validated endpoint. So the most concrete and the most, let's say, trustworthy surrogate endpoint is a surrogate endpoint that's validated. And what that means in the regulatory sense, is that it's an endpoint that's supported by a clear mechanistic rationale and clinical data that provides very strong evidence that an effect on the surrogate endpoint can directly predict a specific and clinically meaningful benefit. That is what we call a validated surrogate endpoint.

There's a second category, which is called reasonably likely and these surrogate endpoints are typically used for making accelerated approval decisions. Overall response rate belongs to this category. It's a surrogate endpoint that's reasonably likely to predict clinical benefit. And looking at it this way, which is the technical definition, overall response rate isn't really a validated endpoint, it's a reasonably likely endpoint that the community feels very comfortable with, in terms of predicting clinically meaningful outcome. And the third category is a candidate surrogate endpoint. Some of the real-world endpoints that we are examining today belong to that category. Those are candidate endpoints that show a lot of promise in terms of giving us information about a meaningful clinical outcome.

And the 21st Century Cures Act, which has come up several times today, asked the FDA, actually mandated the FDA to come up with a list of all the surrogate endpoints that the agency has used in making drug and biologic approval decisions. And here is just a snapshot of some of the oncology-related surrogate endpoints, and have the users link underneath that has the full list, which is actually downloadable. And as you can see, there are a number of surrogate endpoints that have been used and I already mentioned that overall response rate is typically used for making accelerated approval decisions. And that's a very robust, if you will, surrogate endpoint that the community, based on community consensus and the experience that we've had at the FDA, is reasonably likely to predict clinical benefit. So it's been used for making accelerated approval decisions. There are obviously serum biomarkers and there are different ways of validating these biomarkers. And with these surrogate endpoints, if these endpoints are being used as the primary endpoint, then the performance characteristics of the vehicle or the test that's being used to generate these endpoints becomes very important.

And there's a biomarker qualification program at the FDA, and that's one route to qualify and validate these biomarkers. But typically biomarkers don't go through a qualification program, they're incorporated through clinical development programs and the analytical validation, and the clinical validation is done as part of the development program. And there are cytogenic composite endpoints that are typically used, in this case, this is CML, and they are timed to event endpoints, this is event free survival that has been used in the past for both accelerated and traditional approval decisions. So the number of surrogate endpoints and there are a variety of different ways of validating these endpoints and there are different validation types.

So, there's obviously analytical validation. And that speaks to the technical performance of the endpoint or the mechanism through which the endpoint is generated. So if the data is being captured using data from electronic health records, and that data also is combined with other data sources, the way that those dots are connected, the technical performance of that measurement is the analytical validity. And this concept is very close to, also the same principles that we use for companion diagnostic assays. That the performance characteristics of the assay itself can be analytically and clinically validated. And analytical validation speaks to the technical performance. And in this case, the technical performance speaks to how the data was gathered, the audit trails, and the reproducibility, for example, if the same technical measures are deployed on similar datasets. And analytical validity of course, doesn't tell you anything about whether that endpoint itself is clinically useful or meaningful and has clinical validation.

All the real-world endpoints that we're talking about today have correlates in traditional clinical trials, that have already been clinically validated. For example, progression free survival. That's an endpoint that we know is useful and is clinically meaningful. So, that makes the clinical validation piece somewhat easier, however the technical performance still is very critical to make sure that real-world progression free survival closely approximates progression free survival in traditional clinical trials. And that's a function of the technical performance of that measurement.

So, one way that we can characterize real-world endpoints, and I believe this is very critical as we move forward, we need to have a mechanism, a framework, that can harmonize the nomenclature and can basically move everyone in the same direction and making sure that we're speaking about the same things. And one of the best ways to categorize real-world endpoints is to look at these endpoints as drug development tools. Drug development tools are a new phrase, that's a new phrase that was introduced by the 21st Century Cures Act and in fact, congress directed the FDA to develop a mechanism for designing, validating, and qualifying drug development tools. And the way that drug development tools are defined in the 21st Century Cures Act is as follows, a drug development tool is a biomarker, a clinical assessment, and any other method, material, or measure that the Secretary determines can aide drug development. So essentially, real-world endpoints would fall into and under this category, and this is also the category that we're using to validate algorithms, AI algorithms. Many of these algorithms are being incorporated as software, as medical device solutions, as many of you know, and the way that we are approaching the validation, the clinical and the technical validation, of these algorithms are through the drug development tool pathway. And real-world endpoints have and can fall under the same pathway.

And I think that opens up a new area of exploration and discussion and very interesting opportunities in terms of how to systematically approach defining and validating real-world endpoints. And in many cases these endpoints can be thought of as drug development tools for clinical outcome assessment. And many of us are familiar with clinical outcome assessment tools and PRO's are part of that and when it comes to looking at these clinical assessment tools, the same validation principles apply. The language is a little different but a lot of the concepts are similar. And for clinical assessment, we have construct validity, which is based on quantitative methods to make sure that the endpoints and the methodology that is used, quantitatively, to describe these endpoints, align with a pre-specified hypothesis.

And there's also content validation for clinical assessment tools that speaks to the use of qualitative research methods to make sure that the concept of interest, including evidence that the domains that are being used to define these endpoints are appropriate and are comprehensive, relative to the intended measurement concept, the population and the intended use, and this is sometimes called face validity. And I believe these are the nomenclature that we can use to anchor the discussions that we have about real-world data endpoints. And with that, I'd like to welcome Dr. Abernethy back to the stage.

View full transcript