Software on the Fly

Kelly Lake, British Columbia, Canada
Friday, July 22, 2011

Mapper 2.0, available at getmapper.com, offers a way for anyone interested in Pavilion and Kelly Lakes to help classify microbialites. Credit: Rask Systems/PLRP/NASA

Much of the activity that enables the Pavilion Lake Research Project (PLRP) to reach its scientific goals happens behind the scenes. In today’s post I’m going to talk a bit about the software glue, a system known as xGDS (Exploration Ground Data System), that enables PLRP’s scientific success.

Perhaps the most crucial set of scientific data generated by PLRP is video footage. For the four years that PLRP has been flying subs – the first three years at Pavilion Lake and this year at Kelly Lake – on each flight a camera mounted to the front of the sub has captured continuous video footage. A lot of video footage. Nearly two terabytes this year alone.

This footage serves two main purposes. From a long-term perspective it catalogues the locations and depths of the microbialite structures in the lakes. This archive can be used to help scientists tease out the environmental factors that determine the distribution pattern of the microbialites.

Later in this post I’ll talk more about this archive and how you can help PLRP’s science team analyze it. But first I want to describe how this information was used by PLRP in near-real-time to guide both the planning of sub flights and the collection of microbialite samples by scuba divers.

PLRP planned for a week of sub flights this year at Kelly Lake. The first few days were allocated to completing a comprehensive survey of the lake to develop a global picture of its microbialite population. The final days were set aside for more targeted sorties, to explore specific sites noted during the first few days’ flights as being of particular scientific interest. While flights to some sample sites were planned before the field season began, based on previously acquired sonar data, others didn’t get planned until the first day’s survey flights were completed and discussed.

As I described in yesterday’s post, two groups of researchers had set up labs in Clinton to receive and process microbialite samples collected by scuba divers. But first the divers needed to know where to go to collect them. Evening reviews by sub pilots and other PLRP scientists of the video footage captured by the subs helped them select useful sampling sites.

It might sound like a simple proposition to get the video off the subs, review it and make decisions about where to sample. But it wasn’t. For the data to be available to pilots and scientists quickly enough to be useful required the complex xGDS (Exploration Ground Data System) software system, developed at NASA Ames Research Center in Moffett Field, California.

Tamar Cohen, David Lees, Trey Smith and Matthew Deans are the four members of the xGDS team. While waiting for the day’s sub flights to get started, Cohen described the processing required to prepare video data for the planning meeting later the same evening.

As the flights are in progress, a navigational system known as WinFrog tracks each sub’s position, noting its latitude, longitude and depth. xGDS uses that data to produce a continuously updated map, in Google Earth, of the sub’s flight path.

The flight path is monitored by a science stenographer sitting aboard a “chase” boat – also known as a “nav” boat – on the lake surface. Whenever the sub pilot makes a comment about what he or she is seeing, the science stenographer transcribes it as text and it, too, then appears in Google Earth, indicated by a small envelope icon. Occasionally the pilot also takes a still shot of what he or she is looking at, and that still frame gets recorded on the same flash memory card as the video.

xGDS team member Tamar Cohen works her way through a list of software modifications to the xGDS software used by PLRP. Credit: Henry Bortman

This year one of the subs was also tethered via an optical-fiber cable to a communications system that sent its video, and the pilot’s comments, back to the MMCC (Mobile Mission Control Center), 17 kilometers away in the parking lot of the Cariboo Lodge in Clinton. There a room full of scientists – the science back room – could observe the flight in real time, and enter comments of their own.

Once the flights ended, the data on the flash-memory cards had to be prepared – transferred from the card to a hard drive, “wrapped” so it could be viewed in QuickTime, and embedded with time-code information. “Just the process of getting it off of those cards, wrapping it, doing all that stuff takes about 2 hours per card,” Cohen said.

The video, the still shots, the tracking data, and everyone’s comments go into an archival database. Then, Cohen said, “We’re ready to start processing.”

I would have thought those first two hours of prepping the video from a flight counted as processing, but apparently not. Processing involved, for each comment, pulling a two-minute clip from the five hours of video captured by the sub, “and then sending that” – an automated process managed by xGDS – “to get compressed” so that it could be viewed easily over sometimes temperamental Internet connections.

In an ideal world, all this would happen each day before the pilots’ meeting to plan the next days’ activities. But, said Cohen, “We usually get the cards at about 7:00. They want to start reviewing at 8:30. And remember, it takes two hours just to pull [the video] off” each memory card.

It gets worse. “This is the first year we’ve had the tether” enabling people in the MMCC to comment on one of the two flights each day. In past years, when there was no tether, “with a six-hour flight you might have 30 events from the pilot – 30 times, say, they took a picture. And the stenographer might take, say, 50 notes. So you’re talking 80 two-minute clips. It takes 10 minutes to compress each clip. So that’s 800 minutes without the back room team. With the back room team, they’re generating about 250 notes per flight.” So, meeting that 8:30 deadline: “Not gonna happen.”

What Cohen and her colleagues worked out instead was a procedure – again in software – whereby single video frames were extracted for each note, in time for the day’s meeting. That, together with the notes themselves, gave the PLRP team enough information to do their planning, while the compressed-video clips were crunched for later viewing.

I assumed that this had all been figured out beforehand. I was wrong. “I just implemented this this morning,” Cohen told me. In fact, the xGDS team spent the entire field season writing new code to modify and improve the system. “I’m writing new software for about 16 or 17 hours a day,” she said. And not just in the relative calm of the small room in the MMCC where the xGDS team hangs out. Members of the xGDS team were also “doing it from the chase boats. That’s why we’re always chatting with each other.”

In other words, they’re in continuous rapid-prototyping mode. And not just at PLRP, but at other field sites, such as NEEMO and Desert RATS, where xGDS is used as well. “All the field deployments are like that,” Cohen said.

A screen generated by xGDS, showing one of the planned Kelly Lake sub flights (in orange) and the actual path the sub took (in green). Credit: xGDS/NASA/Google/Province of British Columbia

Meanwhile, in addition to the just-in-time data products the xGDS team produced for PLRP, they also created a database of video frames – one frame from every six seconds of video – for later classification and analysis. They’ve been doing this since the first sub flight in Pavilion Lake three years ago. The database now contains well over 100,000 images.

That’s a lot to analyze. And you can help.

Meet Nick Wilkinson, founder of Vancouver, Canada–based Rask Systems. Actually, if you’ve read the previous reports in this series, you’ve already met him. He’s one of the scuba divers who collected microbialite samples this year at Kelly Lake (and in previous years at Pavilion Lake). But he’s also a software developer. This year Wilkinson rolled out the first public version of Mapper, a program he wrote for classifying the contents of that huge database of video frames.

Last year he produced a version that was available to the PLRP team during the 2010 field season. This year’s version is available, at getmapper.com, to anyone who’s interested. “I really wanted to make a more public, friendly face for this,” Wilkinson explained.

“So now people can go to the website, they can set up their own account,” he said, “and they can take tutorials” to learn how the classification system works. Once you successfully identify the contents of the 25 tutorial images, you’re given 25 video frames from the database to identify. Finish that batch of 25 and you’re offered another batch. There’s a contest to see who can classify the most images.

The classifying process is broken into three “labs.” First you work in the Identification Lab, where you “tag” each photo to indicate whether it contains sediment, algae, rocks, microbialites, trees (dead trees, which are interesting because some of them have microbialites growing on them), etc. Once you graduate from the ID Lab (it takes 10 batches to graduate), you can begin work in the Algae Lab, classifying the different types of algae that inhabit the lakes. When you graduate from the Algae Lab, you can move to the Microbialite Lab, which has the most complex classification system.

Give it a try. It’s fun. And you’ll get the satisfaction of knowing you’re helping scientists solve The Great Microbialite Mystery of Pavilion and Kelly Lakes.