On-demand webinar
How AI is transforming hospital utilization management
In the complex landscape of hospital utilization management (UM), effectively applying AI and automation is essential to achieving meaningful results.
Good afternoon and welcome to today's program titled From Promise to Practice, How AI is Transforming Hospital Utilization Management. My name is Chris Cheney, CMO Editor for Health Leaders, and I will serve as your moderator for today's program. Today's program is sponsored by Optum. Thank you to our sponsor and to you and our audience for giving us your time and attention. Before we get started, I have a few housekeeping details. First, to ensure that you can see all content for the event, please maximize your event window and be sure to adjust your computer volume settings and or PC speakers for optimal sound quality. Second, you will find a resources list for today's webinar on the upper right side of your screen. Here we have listed materials from our sponsor for you to interact with. 3rd, at the bottom of your cult console are multiple widgets you can use. Should you experience any technical difficulties during today's program and need assistance, please click on the Help widget which has a question mark icon and covers common technical issues. Finally, it is my pleasure to introduce our speaker for today's program, Alex Yurukimov, Director, Product Management at Optum. Thank you, Alex, for taking the time to speak with us today. And with that, the audience is yours. Thank you, Chris. Good afternoon, everybody. My name is Alex and I have the great privilege of leading the Interqual auto review solution as well as other inpatient UM automation initiatives at Optum. Our goal today and our agenda is going to be first. We're just going to acknowledge the current state of AI. It's in the news. It's it's front of mine. We're not going to spend a whole lot of time on it. We'll walk through some of the key considerations for building and maintaining AI solutions. We're then going to spend the majority of our time exploring different applications of simple, intermediate and advanced AI in utilization management. And then we should have a good amount of time leftover for any questions or discussion that folks may have. Just very briefly. AI is a friend of mine for not only technologists but also providers in the public in general. The healthcare AI market is sizable and growing at a very healthy 40% year over. Year and this rapid growth has raised concerns about the pace of adoption, especially of these more advanced AI models. The allure of AI. Cannot be ignored. It promises to be a powerful support tool that enables clinicians to practice both top their license while at the same time removing administrative burden and abrasion from the system at large. What I will emphasize throughout our discussion today is that despite its promise, we cannot forget that AI remains a tool that still requires clinical judgement to be applied appropriately. A brief. Note on data types, we're going to be talking throughout the presentation about both structured and unstructured data. And just so that we're all on the same page, structured data, both types are used extensively. Structured data is that any information that's organized in a predefined way, think labs, vitals, f low sheets. While unstructured data is essentially everything else, documents, images, videos, reports, emails, etcetera. Unstructured data is messy, to no surprise hopefully to anyone, and the messiness impacts both the development as well as the performance of AI models that have to be trained to understand not only the clinical nuance, abbreviation in context, but even just where it's found. As an example, if we were to try to contemplate an AI solution that evaluates operative notes to predict procedure codes, one might think that we'd look in the operative notes section of the medical record. But we have found that not only there, but there are brief op notes. Some systems choose to put their operative notes and procedure notes, others choose to put them in progress notes. Others have custom note types. And so when designing solutions, especially solutions for more than one health system, it's important to understand just this variability and how and when to ignore other note types that are in the same type and not relevant. And then once we get identified the actual notes themselves, we need to account for the variability as just an example. And while it's easy to point to unstructured data for its messiness, it's also important to remember that even with structured data, values can vary. Labs are a good example here, where from system to system, from facility to facility, or even between labs within the same facility, the same lab test might have different reference ranges that need to be accounted for. I want to spend just a few. Minutes talking about the process of building, implementing and sustaining AI models. It's never been easier to create a prototype solution. In fact, later on I'm going to show you and we can create one together. But don't let that ease fool you. There's a lot of solutions that look really impressive in a sandbox environment, but then fail under real world scrutiny. Data variability, workflow constraints, integration requirements, security and governance considerations. Probably the most important take away here when considering AI solutions, and I alluded to this a little bit, but AI models are living breathing capabilities. They require ongoing map monitoring, evaluation, and tuning and must be resourced accordingly. Production grade workflow. Integrated AI solutions are not suited for skunk work development efforts where a team swoops in, builds, deploys and then moves on to them for the next project. Be very wary of any claims otherwise, as anyone making such either has not actually built such applications or hasn't stuck around to see what happens after their life. Whether you choose to partner with a vendor in the market or choose to build something in house, a couple of key steps to highlight to for the successive model development. The 1st, and this is is universal and doesn't isn't actually AI specific, but just ensure that the problem you're looking to solve is clearly and explicitly defined with set parameters and expectations. Not only is this helpful in controlling the scope of the project and keeping everyone on track, but especially for AI models like LLMS, this becomes critical to control hallucinations and undesired results. Your AI team needs domain understanding, though, not necessarily expertise in the problem that you're looking to solve, and this is because your AI team is your partner that ensures that the right tool set is applied to the right use case. The AI team needs to partner with the end users early and iterate often to ensure that the solution reflects the intent and the needs of the problem you're looking to solve. The advent of how easy it is to build and, you know, iterate on prototypes really means there's no excuse that this sort of development can't happen very tightly, very iteratively, and because often times what's initially conceived, and I can talk about some examples here, may not actually prove to be what's actually needed. One thing that is often not thought about is that a robust governing board and review process is imperative to provide not only structure transparency, but also trust in model integrity and upkeep, whether it's internally or to the market. And then finally, and I've mentioned this before and I'll mention at least a couple more times, but the job is not done when the solution goes live. It's important to consider and have monitoring, A monitoring strategy in place to ensure that model success is sustainability. There's a lot more that can be said, but these are sort of the key points that are worth highlighting for the purposes of our discussion today. And I don't bring this up to dissuade anyone from fully embracing AI into their, UM, workflow. AI already is and will continue to prove its transformational value. But what I am saying is that AI is commit is a commitment from everyone involved. For the purposes of organizing our discussion, I'm going to classify AI into 3 tiers, Simple Automation, Basic AI, and Advanced AI. Each tier has its trade-offs. Simple automation is transparent but rigid, while Advanced AI is flexible but opaque. And even with the great strides being made with reasoning capabilities in the latest LLMS, it's important to match the model to the problem that you're looking to solve and the risk constraints that might exist for your use case. Similarly, we're going to use this simplified framework of the UM process to Orient our discussion as well. Keeping the process foremost in our minds helps us ensure that every application of AI is deployed in support of a specific step in the value chain and makes sense within the greater context of the jobs to be done. Across this top, we see a simplified patient stay with UL relevant milestones after each episode they review. Documentation is assembled to support the patient's level of care before either being referred to secondary review or directly submitted for authorization. Depending on the pair determination and appeal process may occur. Our goal in contemplating the application of AI is to see just how much friction we can remove from each step of the process, or whether a step can be fully automated or bypassed altogether. This overlay here provides us a brief preview into the different types of AI applications, how they come into play, and where throughout the process different types of applications may be more or less appropriate. Let's start with simple AI, which doesn't really feel like AI at all. Traditionally, this has been defined as expert systems, and the hallmark of these systems is that every rule is human authored. It uses explicit, the rules are explicit, and though the rules can get extremely complex, they are transparent. We're going to use inter qual and intercal auto review to illustrate this capability. So this is the interqual acute criteria subset for heart. For heart failure, it's comprised of a series of individual evaluations that combine to present an overall recommendation of medical necessity. When users perform a manual review of this criteria, it actually mimics the function of an expert system. When we think about simple automation, it's important for each evaluation to be specific and explicit, even if more general guidelines could be used by a human. Intercals focus on specificity and objectivity is key, not only the development of automation solutions, but also for achieving alignment and minimizing ambiguity between providers and pairs. When discussing medical necessity, coding the criteria is not about just logic, however. It's about clinical nuance and intent. For example, if we're talking about an. Elevated heart rate. Our perspective is that it must be clearly defined to avoid ambiguity and ensure automation is safe and defensible. A little bit more ambiguously, however, is concepts that are inherently understandable to a human, such as a value being sustained or something being clinically significant. Now, a human can discern whether an abnormal reading, for example remote, requires immediate clinical response or is potentially a reading error. But for a system aiming to automate the evaluation of criteria, those conditions have to be set and defined very briefly. Auto review. The way it automates the evaluation of the Interpol criteria is it extracts data from the EMR, maps the data to individual criteria points, and then evaluates it using a rules engine. The patient's documented reason for admission is used to determine which subsets are generated, and then regular updates from the chart ensure that the review reflects the latest information available. Even for simple automation like Interpol, auto review requires deep integration into the Ehrs implementation requires data mapping, normalization, and clinical validation to ensure that that data nuance and variability that I alluded to earlier is properly evaluated. So in practice, it looks like something like this. The data is extracted, evaluated against the rules. And then exposed. Right at the reviewer's fingertips, you can open each folder icon to see which data was used to evaluate intercall, as well as the rule that was applied to determine if it was met. You can also see the type of data that was extracted. Structured data such as the Vun values can be seen in folder with a mark with a plus sign, while unstructured data such as the criteria point for pneumonia by imaging is denoted by a folder with an ellipse. In some cases, the data is able to meet criteria. Those lines are colored in solid teal. And in other. Cases it's not. Those lines are simply outlined and then the data is still populated for human review. In our experience, we've seen that this approach maps about 75 to 100% of the data needed to complete Interpol acute reviews. But the transparency is is critical for defensibility and trust. Case managers and reviewers must be able to audit and always important override the AI decisions. This requires considerations for UI design, audit trail, and education, which may sometimes be overlooked in some AI deployments. Transitioning from there from simple automation to basic AI starts considering probabilistic models. And the way to think about this is simply you have a collection of known examples, and then the basic AI compares any new data it receives against the the data it's already been taught. And then it generates some probabilistic output, either a. Score. Or a probability. Based on how? Closely, the new data resembles prior examples. One aspect of simple AI models that keeps them relevant today, even in the world of LLMS, is that simple AI models have a more refined control over the models operating parameters, such as confidence thresholds. These can be set and adjusted based on the requirements of the use case. For example, what's the cost of the model being wrong? How conservative or aggressive is the model able to be? That's only defined by the use case. Couple of non healthcare examples to think of with this type of AI include any sort of recommendation algorithm, whether on Netflix or your social media platforms, as well as the auto complete capabilities in chat Outlook in Word. The exact mechanism by which the model arrives at its conclusion is generally opaque to the user, even if the inputs are evident in, UM, simple automation. Simple, excuse me, simple AI presents several useful applications. One that comes to mind, and I alluded to originally, is case stratification. So it a a score that indicates how likely this patient a given patient is for being an inpatient or an observation level of care. Criteria recommendation or subset predictions can be used to suggest the most appropriate subset or criteria to use based on a patient's presentation and driver permission. We can look at models predicting length of stay in AUM context to allow care management and transition planning to begin right from day one. Especially, for example, if we pair it with a predicted discharge destination. Let's say we have a patient who is likely to be discharged to a skilled nursing facility after a four day stay. Well, if it takes us three days to secure a sniff bed, that's information that can be used right away by the care management team to avoid unnecessary bed days. Finally, a patient's risk for readmission may play a factor when considering appropriateness of care for a patient who, let's say, might otherwise be ambiguous or borderline from a criteria perspective, knowing that the patient is a high readmit risk might be important. An important note to keep in mind is that with simple AI models, because they are so dependent on the training data, is that for any localized applications, the models must be calibrated and evaluated against the local population. Models trained on national data need to be validated and for best results these models should be retrained regularly as new data becomes available. Natural Language processing is one example of basic AI. Here we're seeing an NLP model in auto review that comes through imaging narratives in patient in the patient record looking for key terms and phrases. If you see highlighted here, the keywords recognize the radiologist assessment of pneumonia. The model can be adjusted to be more or less conservative, which is reflected in the precision and recall 2 parameters that are used to measure the models accuracy when making predictions. Here again, while the exact mechanics of the prediction are not evident, the inputs used by the model are clearly displayed, enabling the user to quickly either confirm the models conclusion or override it if it's appropriate. The user feedback can then be captured and used to refine the model for continuous improvement. Moving on to the secondary or physician advisor review involves a little bit of a different application of natural language processing. Here we use it to highlight key attributes and risk factors relevant to the patient status decision making. These factors are paired with citations to peer reviewed research drawn from a curated base of evidence based journal. Articles and then are. Served up to the provider as needed. Referencing completed primary review or a case summary if available, would allow the physician to further focus on the exceptions identified in the primary review while evaluating the case and minimizing duplicate work. Quickly drawing the physician advisor's attention to the most common data elements and the most important ones significantly speed up secondary reviews. Unlike the primary review, here the AI is playing more of a support role, allowing the physician to connect the risk factors within the greater context of the patient chart to arrive at a holistic determination. What if we start combining reviews, so combining different, excuse me, different models. Combining different models creates new opportunities for workflow impact that each model in of itself may not be able to accomplish. Here we see how combining the automation of auto review and a predictive case stratification score can improve the reviewer workflow. Intercall auto review can show you which patients automatically met for certain levels of care from observation to critical, while case intelligence can show you the patients that have a high likelihood of being inpatient versus observation in this case. Case stratification scores of zero to 100 represent cases that are either low likelihood of inpatient admission or have limited available documentation. You can use the length of time that the patient's been in the hospital as an indicator as to which one of these is likely. The case scores of 2 to 300 represent high likelihood for inpatient, plus scores of 100 to 200 are more ambiguous and require additional review. Together, we can look at how these indicators bring the right cases into focus. Here Auto review and Case Intelligence together are telling you that Peter Pan and David Davidson have high, excuse me, have status mismatches. These are the cases that require immediate attention. Peter was placed in observation but is automatic for acute as well as having a very high score. This is it immediate opportunity to flip this observation to inpatient status. Conversely, David was admitted to inpatients 12 hours ago, allowing enough time for a critical mass of data to accumulate. However, the auto review is only met for observation and his score is not very. High. This is going to be another priority item to escalate to the physician advisor to determine if the if David was placed in the right level of care. When working with any sort of scoring models, it's important that they be explainable or accompanied by other tools. The case is flagged as high priority. Reviewers need to understand why. Sort of like if you're driving and the passenger in the front seat suddenly says, hey, watch out. While the warning's appreciated, more details needed to be helpful. Finally, we'll move on to advanced AI models like LLMS and AI agents. This class of AI has experienced very rapid change over the past even six months. When this webinar was first scheduled, there's a lot of discussion around reasoning LLMS and their ability to expose the A is chain of logic. Now reasoning ALLMS have ceased being their own separate entities and are just baked into the base models. Advanced AI models can be differentiated from basic AI models in their ability to generate content and act on the digital and even the physical world around them through agents. If you've used Microsoft Copilot or any modern customer service chat bot, you're you're probably interacting with an LLM. In addition, the foundational models that all of these applications are built on, things like ChatGPT, Google, Gemini, Hydroxycod are free to use in what's arguably the greatest example of democratizing cutting edge technology that we've ever seen. Now one thing that to point out that's might not be obvious. LLMS are extremely powerful but require guardrails. Without prompt engineering, they're likely to hallucinate or generate an appropriate content. There are some common techniques that are used to manage this, such as symptom prompts. Excuse me? System prompts to layout foundational rules governing how the AI will approach a given task and breaking down complex tasks into smaller, more explicit steps with minimal opportunities for the model destroy outside its set parameters. Additional techniques including include mandating citations for any claim or even having a second instance of the same AI Fact Check the output of the first AI. But even with all of these capabilities and all of these approaches, it's very important for robust clinical oversight and validation workflows to be in place for any of these LOL models to be used operationally, especially in a healthcare setting. And I mentioned this a little bit earlier, but one thing to highlight that there has been an incredible proliferation of AI solutions, most of them LLM based in the market. It's important to remember that almost all of them derive just from the handful of foundational models I referenced above. The principle differentiator between specific focus solutions is generally how the prompts have been engineered and any additional supplemental data that's been used to refine the models understanding of the problems to be solved. Let's talk about a couple of. Specific use cases for advanced AI and utilization management. The f irst one. Is that I would bring up is the generation of clinical summaries. This would be to accelerate case orientation, hand off and authorization submission the generation as well as the generation of documentation and letters in support of the auth process. LMS are well suited in the evaluation of complex criteria that may not lend themselves readily to codification or simple probabilistic evaluation, automatically evaluating, for example, which surgical procedure is designated as an inpatient. 1 LLMS offer a superior performance to most more basic AI models when identifying your procedure code based on the description found in the operative node, while simple automation can then be used to evaluate that code against a given list. Finally, one example is a a very promising 1 is AI agents. They represent an intriguing opportunity to automate the actual complex process of auth submission through payer portals when an API may not be available for the task. Routine tasks such as document assembly, portal navigation, and even documentation submission can be delegated to a collection of AI agents, each of which can be specialized to a given payer scenario. We're now going to take a closer look into two use cases. We're going to dive a little bit more into case summarization and AP letter generation. The first one is going to highlight some of the LLM strongest capabilities, that of the distillation of A and synthesis of large and complex body of information. Then we're going to explore a little bit more of a cautionary tale in appeal letter generation. It's going to highlight the importance of proper model design as well as illustrating the continued importance of clinical judgment even when used when using AI, especially when using AI. The idea of a case summary is that of a excuse me? The idea of a case summary is to provide the reviewer with a concise Cliff notes, if you will, of the case to reduce the orientation time and help ensure that the most appropriate guidelines are used when performing a review. The case summary can also provide the reviewer with an indication of how ready the case is for review or if key information is missing, which might indicate that more time is needed. Furthermore, the summary provides a complete context for the patient when accompanying A focused criteria review when either submitting for authorization or handing off to secondary aside from the summary itself. When deploying such a solution, it's important to keep that workflow foremost in mind. Where is it most useful to have a case summary? Well as early in the workflow as possible, preferably in the reviewer Worklist provides the greatest value to reduce any unnecessary time spent on cases that are not yet ready for review. Once a case has been determined to be ready for review, that case summary can be used to ensure that the most appropriate criteria selection is used right from the onset. Finally, including this case summarization with any reviews summary helps provide a single concise packet for either secondary review hand off or off submission. One of the most anticipated and even hyped applications of AI in the utilization management space is to use AI to draft AP letters. Appeal letter generation promises to lower the cost and barrier to appeal adverse determinations. However, this is a double edged sword and we're going to use this scenario to illustrate the importance of exercising clinical judgment when using these tools. Just because you can does not mean you always should. Consider the following fictitious case, and I invite you to replicate this experiment yourself with the aid of the documents that's available in the reference section that we talked about earlier. The document there contains 2 prompts, and these are the same prompts that I used to generate what you see on the slide here and can be used with any freely available LLM or your instance of Copilot here. I've asked the AI to generate a plausible scenario for a patient who comes to the Ed, is treated, and then appropriately sent home as they do not qualify for admission to either inpatient or observation. For ease of consumption, I've shown the output here with only minor abbreviations. The patient presents here with just unspecified abdominal pain and general discomfort. Patient had two insistence of vomit earlier today and has off normal but non definitive lab results. The patient responds to Ivs, fluids and Zofran in the Ed and is then sent home. Clear enough? But what if instead they were admitted and then the auth, the inpatient auth was denied? Can we use AI to argue for an appeal? Of course we can. So continuing in the same in the same session, I indicated that instead of the patient being discharged home, that the patient was instead admitted to an inpatient level of care for dehydration and vomiting, and that the inpatient authorization was denied. I then asked ChatGPT to draft me AP letter and this is what it produced. The prompt that I used to generate this is also available in the accompanying document. And let's take a look at how it did. Again, I applied only minor abbreviations to get the the output to fit on the slide. This looks potentially plausible, but is any of this correct? Perhaps unsurprisingly, not really. Now. Again, this is a reminder that the IIS own original output explicitly created a scenario that was not warranted for inpatient admission. That did not stop the AI, however, from crafting a defense for inpatient. Nonetheless. I've highlighted a few of the key arguments that the AI attempts to make, and we'll go through them. Just a couple of them here. First the. Appeal letter here. References. Persistent vomiting several times, despite the fact that the patient only had two episodes according to the record earlier that day and did not have any in the ERGPT, brings up systematic instability, though there's no indications there in the chart. GPT also claims that the patient has an inability to tolerate oral medications and hydration. Now, it's possible that this is inferred from the IV fluids and IV Zofran, but it's not documented in the chart. This appeal letter references aggressive IV hydration even though the patient only received a single liter of saline. AI claims here that both intercall and MCG guidelines support inpatient admission. Now, speaking from the intercall perspective, dehydration and gastroenteritis, although that is a subset, is an observation only subset, and cannot be used to support an inpatient level of care. Even for the observation scenario, the patient needs to receive at least a liter of fluid, demonstrate persistent vomiting after two doses of Anatomy Medic, and have a sodium level greater than 150 and a heart rate over 100 after fluids have been administered. And none of these, almost none of these have been met in this scenario. Admittedly, this is an extreme example. Using a fairly deliberately poor system prompts to illustrate a point. The point here is that with the right inputs and the right prompts, you can make an AI produce almost any result that you want. And I'd encourage you to play with the starting prompt and, and manipulate it to see if you can get different results and, and what those results could be to reduce hallucinations and make the the the argument more defensible to take it. To compound this danger, if we combine this AI generated appeal letter with an AI agent, that makes it even easier to navigate the payer portal and automate the appeal submission. Why not just appeal everything? Well, if we walk through this, the downstream effects of such an approach, we'll see that we would quickly flood the payer with the appeal volume. That would increase backlog turn around time, increasing ultimately days in accounts receivable. And lower. Turnover rates. All this to say. That the advent of AI tools, no matter how powerful, does not fundamentally change the problem that we're solving. It does not change the patient's appropriateness for receiving care to close this case out. I just want to make. It clear that this is an illustration that's not intended to discourage the use of AI tools to draft, UM, documentation. It's meant to. Highlight the importance of using clinical judgement and discretion of choosing when they should be used. When it is used appropriately, it can greatly strengthen the auto submission and we believe reduce upfront denials. An example of this capability could be used that if a, for example, a heart failure patient with comorbid kidney failure who cannot tolerate a full dose of diuretics and does not meet the full level of care per Interpol, AI is very well positioned to quickly and easily draft a well supported exception letter for this common exception, saving both the reviewer time to prepare and submit the case and perhaps even not needing a secondary review altogether. Which brings us back to process. If we find ourselves in a position where we're trying to justify an appropriate admission, that means multiple points of failure have already occurred upstream in the primary and secondary review. Our abdominal pain patient should have been caught in primary with the AF subset prediction criteria automation and managed to to an appropriate level of care well before any odd submissions or denials that had a chance to occur. AI tools offer us an immense opportunity to increase our efficiency and quality throughout the whole process, but must be used appropriately at the right place and time. Our goal is to create a frictionless conversation between the provider and the pair, one where Aces are not sent that are likely to get denied inpatient and the ones that are sent are fully complete. Let's use AI to help us identify our right driver of admission. Why is the patient actually here? Let's use AI to extract all the right data. Eliminate the tedious and error prone manual data Portage exercise that we do today. We should use AI to alert us to gaps in data. Patients admitted for heart failure, they're likely to get a troponin drawn. My patient doesn't. Is a heart failure patient that doesn't have a troponin. We should probably wait until we get that lab data before submitting an authorization. And finally, let's use AI to alert us to data results. Does the data support the patient's severity of illness and intensity of treatment? It does. Then proceed to off submission. If it doesn't, that's where clinical judgment comes in. AI tools can help us answer all of these questions that cannot replace clinical judgment, which brings us full circle to the classic triad of people, processes, and technology. AI is a tool just like any other that has gone before it really powerful tool and has the opportunity to play a very powerful role in transforming the, UM, experience and outcomes. However, without the people we need to be able to trust and understand what the tool does, how it can be most efficiently used. AI is impact is going to be muted at best and could even be detrimental. And while the human capacity for creativity and adaptability can compensate for a lot, broken process can undermine even the best tools and most capable teams. To bring us to conclusion, I'd invite you to come away with the following key points about the application of AI and utilization management. First and foremost, that AI solutions are still tools just like any other. Different AIS can be better suited for different types of jobs and still need to be applied in the context of people and processes to deliver value. AI solutions are living, breathing capabilities that require continuous monitoring and upkeep even after deployment must be researched for accordingly. And finally, AI platforms have made it very easy to create good looking proofs of concept. Don't be fooled, Turning these Pocs into workflow integrated production solutions still requires all the work and more entail their traditional software development. Thank you for your time and I will now welcome any questions you might have. Thank you, Alex, for your excellent presentation. This now brings us to the Q&A portion of the program. You may submit questions through the Q&A box to the lower left of the screen. And Please note that all questions will remain anonymous. Taking a look at the questions here, the first one, how do you ensure that your AI solution is trustworthy and aligns with clinical best practices? So building trust in any solution is a process that takes time, and the more transparent the solution is, the faster you get there. However, in all these cases, right, utilization management's not new. It's been done for years, right? And it takes time in any scenario, whether it's AI or otherwise, to adopt A new process and trust the new process. So I, you know, education, right, is, is a piece, is a big element of, of the process. I think time and being patient with people and, and allowing people to, you know, to make their own way to, to that trust. There's other elements such as, you know, the AI governance Review Board that I alluded to earlier, regular audits of of your model. You can use data. To to some extent, but ultimately, you know, it's, it's a question of user adoption and that always comes down to education and, and training. Ultimately, there's no, there's no. Shortcut to to that. And what about making sure that it's aligned to clinical best practices? The AI solution. AI solution. So ultimately right, it comes down to what is the data that is used to train the model. So in our case, in the case of inter qual right, we have the inter qual evidence based criteria to fall back on. And regardless of whether the the process is being done manually or whether it's done by an AI, the fact that the same foundation is used in both cases helps align to make sure that we're following evidence based guidelines. There is a, there's an opportunity right to integrate real world evidence guidelines, right and outcomes results in conjunction with evidence based guidelines. But in, you know, in our estimate, in our opinion, that's a process that's A and rather than an or. Excellent. How much real world impact have AI solutions shown to have on the UN process? So the the impact that we have seen from from customers that have adopted and implemented these solutions. Range from administrative outcomes to to, you know, CDI adjacent improvements to, you know, to reductions in even reductions in denials. I think ultimately the opportunity that AI and automation presents in utilization management is a clear and unambiguous, you know, picture of the truth, right? The, the data is the data. And the more that we are able to have a, you know, an automation tool present what you know, what is truth, then the less you have differences of opinion and an opportunity for alignment between various stakeholders. Great. Can you discuss the scoring methodology for projecting level of care in the inpatient versus the outpatient setting or denial of hospital care? Sure. Yeah. So this, so the scoring methodology is it's a, it's a straight for, it's a simple AI application where it basically several million discharge records are used to create a profile of patients that discharge from an inpatient verb versus patients that discharge from an OBS. And so that creates your scoring, that creates your, your training data. And then any new cases that come in are compared against that training data. And so the question is, does your presentation look like a presentation of a patient who ultimately ends up discharged from, from inpatient? Or does it look like it's something that's a who a patient who will end up being discharged from OBS? Now, it is worth meaning mentioning here that right when you come into the hospital, right, and there's no data about you, right, your score is going to be very low by default because there's just no data to to compare. And that's why in this example that I referenced, I pointed to looking at the amount of time that the patient is in the hospital, right? To give you a barometer of how representative is that score likely to the patient's final disposition. Excellent, Is there capability for AI to assist with pediatric centric care when working already in an adult oriented guideline and review world? So the short answer is is yes. I think the, you know, in, so intercall and intercall auto review supports both the adult acute and adult pediatric subsets in intercall. SO you know, the, the, you know, nuances of, you know, children having different reference ranges, right? For, for what's normal, what's abnormal, that is, you know, able to be fairly easily codified in an acute setting. And so the, the AI is already being used in the, in the pediatric for the pediatric content. If there's a more sort of more specific use case, happy to elaborate more. Or was that was that the question of can we automate Pediatrics? That was the question from the audience. OK. Next question, when should an organization consider creating an AI solution in house versus buying something commercially available or working with a vendor to build something new? Yeah, I think this is a classic question, right? This comes down to, you know, since the since the dawn of technology, the build versus buy, it has been one that that every organization needs to ask. And the process that I walked through the beginning of building and maintaining and deploying AI solutions is the same whether it's done by a vendor or whether you do it in house. I think ultimately your questions that you'd need to answer are how unique is the problem that I'm trying to solve to me? And secondly, am I willing to invest the necessary resources to maintain the solution once it's live. And if you know the, the conclusion to both those answer is, is yes, that the, you know, the investment of standing up a bespoke solution is worth it, then go for it. But otherwise, you know, generally speaking, it's, you know, don't reinvent the wheel unless you don't have to. Great. Are payers accepting the automated reviews? Yeah. So we are actively working on a number of pilots where payers are evaluating the auto reviews and automated reviews coming out of the provider environments. I think there is similar to a previous question, there's a question of trust, right? How do we build trust in the automation? And similar to that question, the answer is it takes time. It takes time to convince that you have you know the data is correct and representative and you know at that point then you can start getting into interesting conversations about automation and automating approvals of of everyone agrees and the automation is able to fully evaluate the patient. Is there an opportunity here to eliminate redundant work from the provider and the pair side? If the providers doing the same review and the pairs that turn around do exact same review, everyone agrees and everyone agrees all the time, then that becomes an opportunity for for automation. And is there any way to make it more likely that a payer is going to accept an automated review? I think it starts with relationship. And so there's a trust in the, there's two components, there's a trust in the technology component. And then there's also a historical trust, right, depending on the relation to the historical relationship that a given provider has with the payer that sets the, the starting point in the reference, the anchor for, you know, for whether the payer will accept the automated review. So historically, if your organization has, you know, good rapport and, and, and you know, little limited friction, then that likely is a higher, you know, a shorter path to, to acceptance than than otherwise. Interestingly enough, we have shown in some of the the the work that we're doing with our customers now. It is really using the automated reviews to identify trends and opportunities to reduce upfront denials, right? Because with what the automation is able to identify is based on cohorting, right? If you have a specific trend of, of auths that that tends to have, you know, tend to be challenging, you can look at the automation and say get off. Is there a common data element that I'm missing here? Right? Is there is, is there a specific area that's getting hung up? And it's, it opens up much more productive conversations because it becomes conversations about the data. It's, it's, you know, often times when you know, you have conversation of, you know, especially for, you know, high overturn rates, right? That usually suggests that there's an opportunity in that initial submission process. That's if we are able to use the idea that the information, the insights uncover the opportunities. Let's eliminate that abrasion upfront rather than going through the the appeal process. Great, with the implementation of an AI tool, is there an expected denial rate for inpatient care and an increase in OBS status proceeding to P2P? Expect a denial rate for inpatient and an increase in OBS status proceeding to P to P. So whenever you so this is a little bit of a of a of a loaded. Question at least as as I'm reading it, because when we talk about the impacts of any solution. It will be. Significantly impacted by what your current baseline is, right. So if if your current OPS rate is very high and we can look at, you know, national averages right, then there is a very likely opportunity that implementing an AI tool will help increase the OPS to inpatient conversion rate as an example. However, any sort of these transformations, it it's becomes tricky to tease apart what part of this is people processes and what part of this is regression to the mean. So if the AI tool makes it clear that a patient does not meet an inpatient status right and the patient clearly is observation and the tool and the data is trusted and billed accordingly, then that would likely lead to that, you know, increase in OB status. Again, the, the opportunity to have a data-driven conversation with the advent of of these automation tools becomes one of the most exciting aspects of the tools. And then the impact on yeah, denials and and ABS status, right. That is very much organization dependent. I would have liked to say that we have seen great results, right? And and while it will be very easy to say, yeah, we've had customers see a 26% reduction in their initial auth denial rate. And there's a lot of that is, you know, can be attributed to the solutions where one organization starts versus another is highly reflective of sort of how much of an impact that can can have. Excellent are payers countering hospitals AIUM tools with their own? So again, while this, you know, it makes for good sound bites, the I think the, the reality of what we've seen is that the f irst opportunities in leveraging AI and automation, especially in an inpatient setting, is to eliminate redundant efforts, right, Where both organizations agree, where there's a high degree, you know, patients that are in an ICU never get to, you know, patients who comes in with an ICU with a, with, you know, with a, a head trauma, right, are never denied. Those are going to be the first opportunities. To. Apply AI to reduce redundancy. I think the the application of AI for, you know, automating inpatient denials is again, while while it makes for a, a, a, a compelling sound bite, we have not seen that play out from the from the optimum perspective. Oh I love this. Question, the CDI conversate question is amazing. So this is where one of the areas where we're seeing a very, I don't know about unexpected, but a really strong benefit because automation is really good at identifying, for example, if specific providers are admitting patients with vague diagnosis, if specific diagnosis are, you know, show tendencies of a specific data element that's not documented consistently. If there is, you know, a consistent element that the UM team is coming back to the care team to get clarity on right? The the most sort of low hanging fruit is you know, the docs are vague in their admission diagnosis or don't provide admission diagnosis. But there's you can go very deep and very broad with the CDI application of automating utilization management because you're able to have data at scale around what do different cases look like, what do what, what are common gaps between cases. And then what gets into more interesting is that you can then correlate that with your denial data, right? So for cases for OPS that get denied on the first pass, what are their attributes? And then once you identify it, then that becomes an opportunity not only for UM, but for the CDI teams to improve, improve documentation and and work flows. Yeah, I know this. The CDI application, while CDI traditionally right is, has a very RCM aspect to it from a documentation improvement perspective that's more clinical. This has been one of the most exciting opportunities that we see coming out of automating UM. Great. What? Does the AI. What does the AI landscape look like for behavioral health, which doesn't have as many data points such as vital signs to illustrate acuity? Yes, great question. So one of the attributes of acute criteria and one of the things that one of the reasons why acute criteria lends itself so much better to automation has been the degree to which you can use structured data to determine acuity and medical necessity. Behavioral health is even, you know, by its very nature a lot less cut and dry. I think the advances that have come from large language models specifically within the past eight months have really begun to open the opportunity to even begin exploring behavioral health in a way that we've never really had even the technical capability to do. And so while it's certainly, you know, there's will be a road to getting to not only behavioral health, but even, you know, the whole outpatient space, right? Anything that does not lend itself as easily to explicitly codified data points. This is where LLMS are going to come into their own. You know, the, the first exam, the first use cases that were, you know, that we're going to be rolled out are going to be, you know, fairly basic LLM functionalities, case summarization and things like that. But behavioral health it you know it, there's nothing but opportunity in in areas like behavioral health for LLMS to to be applied. I. Think we have time for one more question. Is the automation capability available throughout the patient's day, for example into day 2, day three, etcetera? Yes. So this is a functionality that is going through early adopter testing now. But for the acute criteria, we are very excited to bring continued stay capabilities from an automation perspective. And one of the, you know one of the challenges which with day 2, day three and continued stay is the fact that unlike day one, which you know with rare exceptions everyone does which days you use and which days you evaluate patients for a continued stay will differ based on payer acuity etcetera, right. You'll do OPS patients get reviewed daily versus other patients might just get a day one, day three. So that's been some of the considerations that that we've had to take into account for these continued stay. But the automation that we are we have been doing for day one is coming to day 2-3 etcetera. That's all the time we have for questions today. I want to thank Alex Yurukimov once again for an excellent presentation in our sponsor Optum for making this program possible. If you are interested in learning more about AI and automation solutions, a survey will appear on your screen and it is also located on the bottom right side of your screen. Finally, thank you to you and our audience for participating today. We hope you'll join us in the future for another Health Leaders webinar. This concludes today's program.
Related healthcare insights
On-demand webinar
Discover how to leverage automation for centralized, exception-based utilization management.
Article
Learn more about why evidence-based criteria and clinician insight are needed for informed and trustworthy decisions while leveraging AI.
Article
An efficient utilization management (UM) program is a must to deliver high-quality patient care while managing costs.