Our Truths are Temporary
Imagine getting a message from Mars, saying the spacecraft is communicating from the future. The year, 2053, to be precise.
|Spirit’s Empty Nest. The base station that was home to the rover for 12 martian days is now considered spacecraft debris, since the rover’s wheels got dirty.
For the Spirit rover, that science fiction became science fact on the eighteenth martian morning, in the year 2004.
But unlike Rip Van Winkle, who awakened from a deep sleep to find himself in the distant future, the Spirit rover refused to go to sleep at all. When not sleeping, the rover drains its solar arrays at night–a condition that threatened to put the mobile laboratory into a low-power shutdown.
Rob Manning, the Development Engineer for the risky Entry, Descent and Landing phase of both rover missions, put the problem in present perspective. During the jubilation of Opportunity’s successful touchdown, Manning remarked that as engineers have learned from the Sol 18 glitch in Spirit’s otherwise perfect profile, "our truths are temporary".
This same aphorism was repeated the next day by both Pete Theisinger, project manager, when describing Spirit’s progress made overnight and Steve Squyres, principal investigator, when describing an early hypothesis about the Opportunity site.
A similar philosophical approach to such explorer’s challenges was adopted by a Mars Global Surveyor image team member, Bill Hartmann, when describing what was a difficult sky-color calibration on the Viking mission: "That amusing mistake with the first Viking 1 pictures — releasing an image with a blue sky — really was an example of what we didn’t know and why we went there and what we were learning!" Even the color of the martian sky may be a temporary truth.
A timeline of the problems on Spirit begins on Wednesday, Sol 18. "We believe in testing like you fly," said Jennifer Trosper, mission manager. "Our longest operational readiness test was nine days. But on Sol 18, a problem may have arisen with the file management system, because of the number of files on the spacecraft".
"There are not alot of scenarios that put us in [the year] 2053 on Mars," said Trosper about the corrupted file system onboard.
The picture for Spirit proved particularly cloudy on Sol 21, three martian days later, when the spacecraft refused to answer a beep. Fortunately later that day, Spirit detected its own fault status with enough wits to confirm that it had tried unsuccessfully to reset or reboot its operating system 77 times! It was at this point that the team conjectured Spirit was having a reset problem with its operating system, and modified their early diagnosis that a high-gain antenna or solar flare may have caused their loss of spacecraft command and control. Armed with this hypothesis, however, they still couldn’t put the rover to sleep.
"Based on a hunch from the rover’s lead software architect," Trosper said, the JPL mission support area sent Spirit a "hardware command, which removes flash from the operating system initialization." After ending the continuous reset loop, the rover became commandable. "It was behaving like the software we’d always known." With the risk retired of the rover not sleeping, actual software debugging could begin in earnest without risking further damage to power and thermal controls.
|Mars Exploration Rover with main instruments indicated by location on the unfurled instrument after stand-up.
These problem files are stored in flash memory, similar to what one might use to store digital camera images before transfers. The panoramic camera however generates files that, according the the pancam’s lead scientist, Jim Bell, are "not trivial using a 20 Megapixel camera in a harsh environment…That represents three and half years of fabrication preceded by four to five years of design."
Trosper suggested that one workaround being considered is to delete "about 100" files from flash memory, a memory cleanup beginning with old files from the cruise part of the mission. If flash memory file management turns out to be the root cause, the troubleshooting team may be able to run the rest of the mission using random access memory (RAM). "Without flash memory, there is more space in RAM," said Trosper, "because alot of RAM is being used to manage the files stored in flash. When we don’t mount flash, data can be written to RAM, which doesn’t worry or know about whatever is happening in flash."
"A script has been loaded to get a stack trace during initialization," said Trosper, "and validate our hunch." The team is currently loading many files onto the rover testbed in Pasadena, in an effort to probe the limits and explore workarounds in as close a case as Earth simulation can allow. Overloading a test rover with too many files may eventually push the design to what Spirit discovered on Sol 18.
For Opportunity, one lesson learned from the Spirit experience may be to delete files often, particularly those accumulated during the spacecraft’s cruise phase from Earth to Mars. "We don’t want to reach Sol 18 on Opportunity, without some recommendations," said Trosper.
"We learned alot from the end of [the 1997] Pathfinder" mission, said Trosper. Watching that rover go well beyond its expected lifetime, a slow death of sorts under harsh thermal conditions with little reserve power, showed what problems are likely to jeopardize a mission. "Spirit stayed up two nights without a low-power anomaly. It didn’t get too hot or cold. That speaks to the robustness of the design, that Spirit can recover." But when communication was a system in jeopardy, the outlook moved from critical to serious to its present status in recovery. "Pathfinder taught us how to extend a mission."
When Steve Squyres was asked if the stand-down on Spirit science would jeopardize their task to ‘follow the water’ history of Mars, he thought the rover may exceed its expected surface lifetime eventually. "The 90 day operation on Mars is when the warranty expires. The wheels don’t fall off at 91 days. Thermal and power systems are well above their margins, so we may go significantly more than 90. We planned for one out of every three Sols being blown away with non-science" or engineering-related maintanence. "Instead we went 17 for 17 with daily science" up until Sol 18. "So if we had to stand-down for thirty days, we could still reach the warranty limit with 60 days of science on Mars".
Theisinger concluded that the temporary truth for Spirit function will evolve towards some kind of "rules of the road"–a command sequence that enables the rover to ignore software faults that are telling it contradictory truths.