Date: Wednesday, 14 March 2007, 6:30 PM
Location: SAP LABS, Building D, 3410 Hillview Avenue,
Palo Alto, CA (Google
Maps | Yahoo!
Maps | Mapquest)
Cost: Free and open to all who wish to attend, but membership is
only $10/year.
Topic
The data from scientific simulations, observations, and experiments is now being measured in terabytes and will soon reach the petabyte regime. The size of the data, as well as its complexity, make it difficult to find useful information in the data. This is of course disconcerting to scientists who wonder about the science still undiscovered in the data. The Sapphire scientific data mining project at Lawrence Livermore National Laboratory (http://www.llnl.gov/casc/sapphire) has been addressing this concern by applying data mining techniques to problems ranging in size from a few megabytes to a hundred terabytes in a variety of domains. Using example problems from domains including fluid mixing, molecular dynamics, astronomy, remote sensing, and experimental physics, I will discuss some of the challenges we have encountered in mining these datasets. I will then discuss what the future holds for scientific data mining as we move to petascale computing.
About the Speaker
Chandrika Kamath is a computer scientist at the Center for Applied Scientific Computing at the Lawrence Livermore National Laboratory, where she has led the Sapphire project in scientific data mining since 1998. Her research focuses on the analysis of data from observations, experiments, and simulations, using techniques from image and video processing, data mining, pattern recognition, and statistics. The Sapphire project won the 2006 R&D 100 award for its scientific data mining software. Prior to joining LLNL in 1997, Chandrika was a Consulting Software Engineer at Digital Equipment Corporation (DEC), developing high performance mathematical software for the Digital Extended Math Library (DXML). She earned her Ph.D. in 1986 and her M.S. in 1984, both in Computer Science from the University of Illinois at Urbana-Champaign. She holds six patents in data mining, is an Editor-in-Chief of a new Journal 'Statistical Analysis and Data Mining' premiering in 2007, and is active in organizing data mining conferences and workshops.