Speaker
Description
The increasing interest in lunar exploration, particularly in establishing long-term infrastructure near the Moon, underscores the need for robust and autonomous spacecraft control systems. Lunar halo orbits are appealing locations for future missions, acting as potential hubs for both large and small spacecraft operations. However, the inherent instability of these orbits, combined with the risk of unforeseen events such as engine failures or communication disruptions, poses significant challenges to mission success. This study addresses the need for autonomous recovery strategies for spacecraft operating in unstable lunar halo orbits, emphasizing scenarios where traditional ground-based control is unavailable or inadequate.
In this research, it is assumed that, under the nominal scenario, motion is maintained using a traditional method that is both effective and robust, provided there is two-way communication with the Earth. We consider a situation where an unexpected event occurs, temporarily terminating communication and necessitating autonomous control. Such backup control operates for a limited duration, for instance, a couple of revolutions around a libration point, until communication with the spacecraft is restored.
The primary goal of this work is to design and study a reinforcement learning-based approach for spacecraft control in emergency situations around unstable halo orbits. Unlike conventional control methods that rely on precise state estimation, our approach operates directly on raw sensor measurements, specifically the directions to the Moon and Earth. This capability is crucial for enabling autonomous decision-making when precise state information is unavailable or unreliable. Reinforcement learning methods have already proven effective in astrodynamics and are particularly well-suited for scenarios where control is based on measurements rather than states.
For simplicity and computational efficiency, the primary analysis employs the Circular Restricted Three-Body Problem (CR3BP) model. This model effectively captures the essential dynamics of halo orbit motion, and its simplicity is beneficial for training reinforcement learning policies. While the proposed method can also be applied within the framework of an ephemeris model, doing so introduces additional technical complexity and a greater number of parameters affecting the spacecraft's motion.
The essence of the proposed approach lies in a meta-reinforcement learning algorithm designed to train a control policy capable of managing a range of off-nominal scenarios. This control policy can be implemented using either recurrent neural networks or stacks of recent measurements. The emergency situations considered include engine malfunctions, missed measurements, and changes to the available measurement set. By engine malfunctions, we refer to situations where the thrust magnitude deviates from the commanded value by an unknown factor. This factor should be estimated using a meta-reinforcement algorithm.
The results include (1) the recovery costs in terms of ΔV compared to the station-keeping costs of traditional methods, (2) the influence of measurement composition on recovery success rates, and (3) the system's capability to recover from engine malfunctions without prior knowledge of the malfunction parameters. Additionally, we examine the minimum measurement set necessary for successful mission recovery, offering insights for future mission designs with limited sensor capabilities.