Data Capture, OCR & Content Management Makes the Responsa Project Accessible to All

The Responsa Project is the most comprehensive and easily searchable digital Jewish library in the world. The Responsa Project began at the Weizmann Institute in 1963 and is now being managed by the Bar Ilan University in Israel. In the early 1970s, the U.S. National Endowment for the Humanities awarded a large research grant to support the Responsa Project that has led to many research and development activities to expand the project. It contains the great works of Jewish wisdom and includes the Bible and its principal commentaries, the Babylonian Talmud and the Jerusalem Talmud with commentaries, Midrashim, Zohar, Rambam, Shulchan Aruch with commentaries and the collection of Responsa questions and answers on matters of Jewish law.

In the first stages of the project, the focus was on the Responsa literature. It is an account of all the questions and advice between Jews and their local rabbis on various subjects. Many of the resulting questions and answers were collected in books and needed to be digitized.

The second stages of the project targeted documenting numerous halachic, historical, sociological and economic data, which reflect approximately 1,000 years of the Jewish life. Due to the quantity of this material, a special committee was established to set priorities about which texts to include in the database, in light of their relevance, scope and accessibility. Some material and ancient Hebrew scripture was difficult to read and digitize. This presented a need to invest in technology that has the ability to extract information from ancient materials. Building on expertise in artificial intelligence, image analysis and data mining, scientists were able to create solutions that incorporated Optical Character Recognition (OCR) capabilities, like that of NovoDynamics, using advanced OCR technology to extract critical information and combines the information extraction with accurate language recognition technology.

Prior to using a technology approach and deploying OCR software, immense efforts were committed to the manual creation of text-indexes according to selected keywords. These efforts were ineffective and too time-consuming. Professor Aviezri Fraenkel believed that investing in automation would be more productive. What was needed was an OCR product, like NovoDynamics' VERUS™, that could provide superior accuracy, including the highest language recognition accuracy for Hebrew and Middle Eastern languages in the world. The committee chose to use OCR technology to extract information from the most challenging data sources. They quickly searched through large, complex data sets to find answers to their most difficult problems.

VERUS technology has been used in several similar applications, such as Project AMEEL, a joint effort by Yale and Stanford Universities to create a scholarly web-based portal for studying the Middle East history, culture and development, as well as by the University "St. Kliment Ohridsky" of Sofia, Bulgaria to support a new research project in Arabic Language and Linguistics. VERUS is an advanced OCR product because it delivers the highest level of accuracy when processing real-world documents, such as yellowed pages, poor copies and stained documents. It uses advanced, proprietary image processing technology that automatically cleans and orients pages before recognizing text to provide be immense precision for recognizing Hebrew and Middle Eastern languages. OCR is supporting the Responsa Project in being able to provide today such access to valuable texts as thousands of years of writing, from the Bible and the Talmud to current volumes.

The Bar-Ilan University uses a retrieval software engine to enables classic free-text searches for Boolean word combinations using an inverse index. The result for the Israel's Bar-Ilan University is that there are currently 300-plus books within the database portal that are available to students, researchers and the public. The books have been digitized. The public can conduct simple queries with ease on any topic of interest using key words and a sophisticated search and retrieval mechanism. They can also request and get a printout of every Responsum in these 300 volumes, which provide summaries of the topics. Tamar Duke-Cohan, who teaches classes about the Holocaust at Hebrew College in Newton, MA said that she thought that the "printing functions resulted in printed pages that look very good. In addition, there is some search functionality, mostly through indexes that is helpful. Once you get the hang of these, they are quite useful."

In 1991, the Online Responsa was launched and also made into a CD for easy accessibility. The project contains not only useful content for the Torah scholar but also provides an easy user-friendly way for the general public to learn and uncover such information about how life was really like in terms of how people dressed, the coinage of the time, the mercantile systems and, of course, the use of language in different eras. For instance, a historian may use the search functions of the Responsa project to enlighten him/her on various aspects of a certain era. There are indexes to many halakhic works and keyword combinations that now exist in more modern and standardized language. There is also a "simulated thesaurus" that offers users who are not familiar with technical terminology.

To date, over 300 man years have been invested in developing the cutting-edge software behind the Responsa Project and in data entry and verification.

For more information on the Responsa Project or on NovoDynamics, Inc., visit www.novodynamics.com.