Deobfuscating Machine Learning Assurance and Approval

UpdatesJune 03, 2025

The push for new capabilities and safety improvements in aviation drives technological innovation, including the transformative potential of machine learning (ML). However, unclear assurance and approval frameworks hinder ML’s integration into type certificated civil aircraft, delaying safety-enhancing progress. The paper: Deobfuscating Machine Learning Assurance and Approval, authored by experts from Joby and Merlin, demystifies the regulatory framework and practical requirements for assuring systems using ML in civil aircraft, transforming a complex conversation into actionable expectations that can accelerate safe innovation in aviation. This blog post offers a high-level overview of the paper.

Current assurance methods for items like software should allow low safety risk ML applications to gain approval under existing regulations, while higher-risk uses should be evaluated using a case-by-case negotiation and other novel assurance methods. Routine approval of low safety risk applications, such as applications that provide advisory information, will allow the industry and regulators to try new methods without risk or penalty, from which both parties can gain experience and establish confidence in those methods. Advancing ML integration for both low safety risk and high-risk applications will require strategic research, validation, and policy development to raise the baseline for routine approvals and establish clear pathways for higher-risk applications.

Navigating Assurance and Approval in Machine Learning: Principles, Practice, and Burden

Assurance and approval in aircraft type certification ensure aircraft and systems designs meet the regulatory safety standards set by the FAA for safe operation in the National Airspace System. Regulations mandate safe performance under expected conditions and are guided by an “all and only” principle: the aircraft, as a whole, must be shown to do all that it is intended to and only that which is acceptable. Developers show adherence to this principle using a suite of standardized assurance methods.

Assurance methods help ensure that each step of an aircraft design is consistent. Demonstrating clear alignment between the aircraft’s overall design intent, as well as the systems and software that are implemented, is essential for building a safe and reliable aircraft. This process is also critical for showing that a system does ‘all and only’ what it’s intended to. As a design becomes more complex, such as by adding novel safety features, it’s important to maintain a clear connection between each layer of the system to make sure everything still reflects the original design intent.

Showing the aircraft as a whole meets “all and only” is typically achieved through item-level showings that support system-level showings that ultimately support the aircraft showing. In simpler terms, certifying an aircraft is like building a pyramid: every block has to fit perfectly to support the next layer. To prove the aircraft does exactly what it’s supposed to—and nothing more—engineers start by verifying the smallest pieces (like software or components), then show how those come together into larger systems, and finally how those systems make the whole aircraft work safely. When machine learning is used, it can only shape behavior through these individual components, like software. That means the toughest safety questions start at the foundation.

Unlike conventional development processes, ML can introduce unacceptable behaviors into items in ways other than human error. While this challenges existing assurance methods, it also provides perspectives on how this challenge may be addressed. This involves clarifying assurance concerns and examining processes for demonstrating compliance.

Item-level Showings

Challenges arise from ML’s complexity, opacity, and reliance on data sufficiency, along with a lack of fault models and mature assurance methods. Assurance for ML-supported software focuses on demonstrating transformational integrity to meet requirements and integrate into the system. For low-criticality functions, existing assurance methods do not extend deep into the item development assurance processes, allowing for ML to be used for components within these items without additional burden. In some cases, system-level demonstrations can reduce or eliminate certain item-level assurance burdens.

System-level Showings

System-level strategies offer an alternative by validating functional performance independent of implementation. Techniques like runtime assurance and performance-based validation address various uncertainties, including ML and environmental conditions. These approaches simplify compliance by focusing on overall system behavior, enabling applicants to bypass item-level complexities while ensuring alignment with safety intent.

Baselining and Expanding Options

The integration of ML into civil aircraft systems requires a structured approach to assurance and approval, balancing existing pathways with the need for innovative solutions. Establishing a defensible baseline for low-criticality applications provides a foundation for routine approvals while creating opportunities to refine assurance methods. Expanding this baseline through strategic initiatives and operational experience can address higher-criticality applications and persistent safety challenges.

To support this evolution, focused efforts on clear approval processes, targeted research, and workforce development will be essential in enabling scalable and efficient integration of ML into the aviation industry.

A Defensible Assurance Baseline

Low-criticality ML applications align with existing approval pathways, forming a foundation for assurance. Higher-criticality applications may require case-by-case approvals, but the lack of general-purpose criteria limits broader adoption.

Raising the Defensible Assurance Baseline

Expanding the baseline involves refining assurance methods to ensure predictive confidence and support efficient compliance. Prioritizing channels like NORSEE and MOSAIC, and others can address safety challenges and simplify approvals for innovative technologies.

Directing Attention and Energy

Efforts should target clear approval processes, research-driven assurance methods, and workforce development. Education and policy codification are essential for scaling ML application approvals, with system-level strategies offering alternatives to item-level complexities by validating behavior regardless of implementation.

Low-risk and some higher-risk ML applications already have approval pathways, forming a defensible baseline. Education, outreach, and operational experience within this baseline are crucial to refining methods and raising the assurance standard. Civil aviation’s strong safety record reflects decades of prioritizing safety, and any new methods must align with regulatory safety intent, proving their value before adoption.

For the full version of the paper published in and presented at the 43rd Digital Avionics Systems Conference, San Diego, September 2024 click here.

Merlin Collaborates with USAF to Advance Autonomous Contingency Management Capabilities

News

Oct 22, 2025

Flight Test Campaign Milestone: Validating Core Capabilities of Our Certification-Ready Aircraft

News

May 08, 2025

Merlin Establishes Hanscom Field as New Flight Test Hub to Advance AI in Aviation

News

Sep 18, 2025