Whether scientists are investigating black holes in outer space or the inner workings of protein molecules, the success of their future research depends upon advanced computing and information transfer.
“Computer science has advanced so much in the last 10 to 20 years that it has the potential to change the way science is practiced,” says Kenneth Chiu, an associate professor of computer science at Binghamton University. His view is echoed by the National Science Foundation (NSF), which reported in March that “fewer and fewer researchers working at the frontiers of knowledge can carry out their work without cyber-infrastructure.” With a recent grant from the NSF, Chiu is working on a project he calls the CrystalGrid Framework that will help scientists make the most of this potential.
Chiu’s specialty is distributed computer systems, with a focus on cyber-infrastructure. He studies how to manage and curate (that is, store and retrieve) large amounts of data in different physical locations and formats. Because cyber-infrastructure would place the dramatic advances of information technology at the service of researchers in many disciplines, the NSF wants to create a national cyberinfrastructure that integrates advances in both hardware and software.
Experiments inside a computer
Chiu explains that cyber-infrastructure can give scientists access to databases of results that might provide new insights. It would also allow researchers to use expensive, high-power computing equipment at remote locations for their experiments. The upshot is that scientists would be able to apply mathematical models when physical results can’t be observed or replicated. “Experimentation is shifting from in vivo to in silico,” Chiu says. “Computational power has increased to the point where it can, in some situations, simulate the physical conditions accurately enough for scientists to conduct the experiment entirely inside the computer.”
Advantages to computational experimentation include speed and automation, which remove the human intervention that can cause errors in handling data. Drug development, for example, could benefit from computational experimentation, because researchers could first run computer tests to discover which chemicals will bind with a particular protein associated with a certain disease.
“You could, in theory, screen many compounds automatically via computer, then do the actual lab tests only on the promising ones,” Chiu says.
Chiu’s research team is tackling the quintessential problem of the data itself. Most scientific information is stored in a table that focuses on an entity and its relationships; for example, a height-weight table for people of different ages found in the doctor’s office. Chiu feels this model, borrowed from business and industry, is poorly suited for scientific data.
“The data is hard to use because it has been modified to fi t into an arbitrary table, so our goal is to fi gure out a minimum structure in which to store it and to keep the data as raw as possible,” he says.
Chiu believes data is more reusable if distilled into its fundamental form. One such fl exible form is known as a “triple,” which relates subjects, properties and values. For example, sulfur (subject) has the color (property) of yellow (value). The resulting triple is therefore “sulfur-color-yellow,” and the database that stores triples is a “triple store.”
A talented team
Thanks to a second NSF award in 2006, one that supports research experience for undergraduates, Chiu’s research team includes promising undergraduate as well as graduate students, many of whom are working on the CrystalGrid project.
Natan Zohar, a recent graduate, got his first exposure to research working with Chiu. That experience helped land Zohar a Seattle job with Microsoft. Under Chiu’s supervision, he wrote a computer code in C++ language that allows scientifi c instruments to accommodate the triple-store format.
“I learned a lot about the underlying [programming] protocol,” Zohar says. “Dr. Chiu often sent me interesting pieces of code to look at.”
Doctoral student Yibo Sun is finding ways to help scientists access computers that are far away. Sun’s own research focus is on large-scale grid computing. Working on the CrystalGrid project allows him to see the big-picture application of his work. For example, he says, although a server might sit on one research campus, “scientists anywhere might log on with a specific URL to access data they store there.”
“People are making more demands for data due to the rise of the Web and the Internet, where they can get everything through links,” Chiu adds. “The long-term consequences of limiting that access are now coming into play.”
Indiana University Chemistry Partners
When Kenneth Chiu, associate professor of computer science, received a three-year National Science Foundation (NSF) award in 2005 to develop the CrystalGrid Framework project, the NSF asked him to solve one piece of a national puzzle: that is, how to create a coast-to-coast scientific cyber-infrastructure. “[The NSF] set the requirements of what’s useful to them and defi ned the research problem,” Chiu explains.
The application the NSF turned Chiu loose on involves X-ray crystallography, an analytical technique that beams X-rays through a single crystal of a compound. The light diffracts according to the molecular lattice found in the crystal, creating a distinctive diffraction pattern, a structural fingerprint that identifies the individual components. It requires thousands of pictures to amass enough information to unambiguously determine the crystal’s molecular structure. Chiu’s partners in this endeavor are chemists at Indiana University in Bloomington.
Scientists have been frustrated in their crystallography research because useful data may exist in a format or database with which their computers cannot communicate. The CrystalGrid will offer a framework of Web interfaces and computers through which data can fl ow easily from one research group to another.
The NSF abstract of the award notes that although CrystalGrid will initially benefi t a few hundred crystallography labs worldwide, “the software and methods … are intended to be reusable for any science moving from individual lab practices to a shared, global, collaboratory system.”