Date: Wednesday, 14 May 2008, 6:30 PM
Location: SAP LABS, Building D, 3410 Hillview Avenue, Palo Alto, CA (Google Maps | Yahoo! Maps | Mapquest)
Cost: Free and open to all who wish to attend, but membership is only $10/year.

Topic

Abstract: Scanning books, magazines, and newspapers has become an widespread activity because people believe that much of the worlds information still resides off-line. In general after these works are scanned they are indexed for search and processed to add links. In this talk we will describes a new approach to automatically add links by mining repeated passages. Our technique connects elements that are semantically rich, so strong relations are made. Moreover, link targets point within a work rather than to the entire work, facilitating navigation. Our system has been run on a digital library of over 1 million books, has been used by thousands of people, and has generated the worlds largest collection of quotations. We will also present a follow-on project based on the theory that authors copy passages from book to book because these quotations capture an idea particularly well: Jefferson on liberty; Stanton on womens rights; and Gibson on cyberpunk. Our Key Ideas prototype provides an interaction model where readers fluidly explore the library by viewing popular quotations on a particular key term, and follow links to quotations on related key terms.

About the Speaker

Okan Kolak is a researcher at Google, working on text analysis and processing within the Book Search project. Before joining Google, Dr. Kolak was a graduate student at University of Maryland College Park, where he received his PhD in Computer Science for contributions to rapid resource transfer for multi-lingual natural language processing. He was a member of the Computational Lingustics and Information Processing Lab, Language and Media Processing Lab, and Center for Automation Research. His research involved statistical modeling and methods, resource acquisition and transfer using parallel corpora, machine translation, information retrieval, and optical character

Bill Schilit is a researcher at Google. Before joining Google, Schilit was principal scientist with Intel's Digital Home Product Group, codirector of Intel Research Seattle, managed personal computing research at Fuji-Xerox (FXPAL), worked on networked systems at AT&T's Bell Labs, and was part of the team that invented ubiquitous computing at PARC from 1992-1995. His interest is ubiquitous information with a focus on the development of personal and mobile technologies supporting knowledge work. Schilit received a PhD in computer science from Columbia University. He is a associate editor in chief of Computer, a member of the IEEE Computer Society and the ACM. Contact him at schilit@computer.org.

Back to the DM SIG page