About DAR
What is DAR?
The Digital Assets Repository (DAR) is an
eco-system of components developed by the International School of Information
Science (ISIS) at the Bibliotheca Alexandrina (BA) to create an institutional
repository maintaining the Library’s digital collections. DAR accommodates and
archives any media type due to its flexible architecture. Moreover, it provides
public access to digitized collections through a web-based search and browsing
facility.
Why DAR?
DAR has been built mainly to
support the creation, use and preservation of a variety of digital resources.
It provides management tools which facilitate the process of creating, managing
and sharing of the Library’s digital assets. The system is based on evolving
standards and can easily be integrated with other systems.
Accessing Books on
DAR currently encompasses the
largest Arabic book collection. For books that are out-of-copyright, their
contents are fully available on the Internet. For books that are in-copyright,
Internet users can browse only 5% of the book, with a minimum of 10 pages.
Furthermore, for in-copyright books, the system allows simultaneous access according
to the number of physical copies available at the BA. That is, if BA has
purchased two copies of a book, only two users can access the digital copy
simultaneously. Only when one of them releases the book, another user can have
access to it.
DAR Main Usability Features
DAR provides different viewing
options, searching for a keyword or expression, tagging, sharing books on other
social networks, rating books, and interacting with other users through
submitting comments. Users may also place books of their choice into different
folders thus creating their own "Bookshelves". Annotation tools are
available to provide highlighting, underlining certain spans of text, adding
sticky notes, etc. Moreover, when a user searches for a book, the system
displays several options to narrow down the search results; a process which is
generally known as Faceted Search.
Technical Aspects of DAR
DAR Modules:
DAR consists of several modules:
The Digital Assets Factory (DAF): which provides flexible management for the digitization workflow, and a
unified means of ingestion into the system. It supports both physical and born digital materials with
different media types. It integrated easily with automated and human phases,
checking integrity at each step of the workflow.
DAF is available for download at: DAFWiki
DAM (Digital Assets Metadata) manages the metadata of the objects within
the repository. It consists of a metadata store for METS, in addition to using
Fedora for metadata management. The system provides flexible metadata editing
through the use of XML templates and dynamic forms. It also allows for
synchronization with different ILS systems or other data sources (e.g.
application backend) which is also based on XML templates.
The Digital Assets Keeper (DAK) is a storage layer for
digital objects responsible for caching, versioning and load balancing.
A RESTful API for building applications on top of the Repository. Applications can
query for new or updated metadata and files, and can also access a slice of the
data in the Repository based on their access rights. This constitutes the Digital Assets Publishing layer (DAP)
A Discovery Layer provides full text
search across the whole collection and is based on the access rights granted to
the user. Full text search is built on Solr with support for 5 languages:
Arabic, English, French, Spanish and Italian.
DAR is open source and has been
deployed at the Bibliotheca Alexandrina’s Digital Laboratory since January
2007. DAFv2 manages the entire process of digitization, including its various
phases, system users, files movement, archiving, and integration with the ILS
and the Library digital repository. This version also supports workflow dynamic
evolutions and deviation to allow for exception handling, and provides history
tracking of actions and flexibility to simultaneously manage multiple projects
with a diversity of materials. It also has the advantage of supporting the
ingestion of a job in the middle of the workflow, and it allows easy
integration of the tools used to perform functions of the workflow.
Optical Character Recognition
A digital Book Viewer displays
the books based on the image-on-text technology. Research was carried out in
co-operation with Arabic OCR producers in order to achieve efficient, high
quality recognition for mass OCR production for Arabic content, reaching an
accuracy ranging from 90% to 97%. Although the accuracy is not high enough to allow
users to read the output of the OCR, it is good enough for searching.
Therefore, the BA has concentrated its efforts into publishing books using the
text layer behind the image, to allow for searching the text while exposing the
image to the user. The full text content-based search is performed on the whole
collection of available books.
The Book Viewer
The book viewer provides several features for the user’s
convenience, such as:
- Full text (morphological) search within the book's title, subject, keywords, and content;
- Search results are highlighted within the book;
- Single page or two page view;
- Annotation tools: Highlighting , underlining and sticky notes;
- Streaming; by displaying one page at a time to facilitate displaying the book over a slow Internet connection;
- Multilingual interface.
The Digital Lab
DAR is also concerned with the
digitization of materials already available in the Library or acquired from
other institutions. A digitization laboratory was built for this purpose at the
Bibliotheca Alexandrina. The lab is equipped with the state-of-the-art
technologies for digitizing different types of material, including slides in
multi formats, negatives, books, manuscripts, pictures and maps, audio and
video. The complete cycle of the workflow to produce digital objects has been
automated and integrated with the BA Library Information System.