Querylog-based Assessment of Retrievability Bias in Delpher

On March 17, we were invited by the National Library of the Netherlands to present the results of our study on retrievability bias in the Dutch historic newspaper archive.
The research was conducted in collaboration with the WebART project and will be presented at the Joint Conference on Digital Libraries (JCDL) 2016 in Newark, USA, in June 2016.

Summary of the talk:

Search engines are not “objective” pieces of technology, and bias in Delpher’s search engine may or may not harm user access to certain type of documents in the collection. In the worst case, systematic favoritism for a certain type can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the “retrievability” for all documents in a collection. We explain the ideas underlying the retrievability metric, and how we measured it on the KB Newspaper collection.  We describe and quantify the retrievability bias imposed on the newspaper collection by three different commonly used Information Retrieval models. For this, we investigated how document features such as length, type, or date of publishing influence the retrievability.
We also investigate the effectiveness of the retrievability measure, featuring two characteristics that set our experiments apart from previous studies: (1) the newspaper collection contains noise originating from OCR processing, and historical spelling and use of language; and (2) rather than the simulated queries used in other studies, we use real user query logs including click data. We show how simulated queries differ from real user queries regarding term frequency and prevalence of named entities, and how this affects the results of a retrieval task.

Slides:

Stitch by Stitch: Annotating Fashion at the Rijksmuseum

Rijksmuseum – Modemuze – COMMIT/ SealincMedia – Wikimedia Nederland

Saturday 23rd April 2016 – Cuypers Library, the Rijksmuseum

Fashion heritage, collected over centuries, can be found everywhere in museums: costumes, accessories, paintings, prints and photographs. But while some clothes and accessories are easily found and identified, others are obscure and require a trained eye to describe. What are we looking at? What kind of sleeve is this? Which materials and techniques have been used? More specific descriptions of the images facilitate better use of digital collections and enable users to wander through them in detail.
The Rijksmuseum and Modemuze are looking for specialists and enthusiasts with a passion for fashion and costume to join an expedition through their digital collections.

Modemuze is an online platform and network of 11 Dutch museums, including Rijksmuseum, with a fashion and costume collection: Amsterdam Museum, Centraal Museum Utrecht, Fries Museum Leeuwarden, Gemeentemuseum Den Haag, Museum Rotterdam, Paleis Het Loo, Rijksmuseum, Tassenmuseum Hendrikje, TextielMuseum, Theatercollectie Bijzondere Collecties UvA, Tropenmuseum, Afrika Museum, Museum Volkenkunde.

Annotating the collections

Researchers from VU University Amsterdam, Delft University of Technology and the Centre for Mathematics and Informatics and the Rijksmuseum (in the context of the COMMIT/ SEALINCMedia project) have developed Accurator: an online tool to improve the process of annotation of digital collection objects, e.g. being able to find relevant objects to annotate, annotate specific parts of an object, etc. Following ‘Birdwatching in the Rijksmuseum’, this time the Accurator tool will be used to describe fashion related objects from the Modemuze and Rijksmuseum collections.

Participants in the fashion annotation event are also invited to record their findings in the Wikipedia Encyclopedia, Wikimedia Commons and in Wikidata, Wikipedia’s open database. Wikipedia volunteers as well as staff from the Rijksmuseum and Modemuze will be present for support throughout the day.

Program

9.30 – 10:00 Registration and Coffee
10:00 – 10:10 Introduction of the Accurator tool
10:10 – 12:00 Annotating fashion in the digital collections (using Accurator)
12:00 – 12:30 Discussion on the use and future of fashion annotation

Participation in the event is free, but registration is required. To register, please send an email to: accurator@rijksmuseum.nl with your name and your interest in fashion. (We will take your subject preferences into account when setting up the Accurator tool.) If you have any questions regarding the event, please feel free to email them to this address.

IMPORTANT: The event will take place in the Cuypers Library of the Rijksmuseum. The following guidelines need to be taken into account:

  • On the 23rd of April you can report at the RIJKSMUSEUM desk in the Atrium. Please bring your confirmation of registration. Without this, entry can be denied.
  • Bring your own laptop. There are strict safety guidelines in the Library, which limit the use of laptop power supplies. Please make sure the battery is fully loaded to last for 3 hours without further charging.

www.accurator.nl (general info)
annotate.accurator.nl (annotation tool)
www.modemuze.nl
www.rijksmuseum.nl
wm.cs.vu.nl (Web & Media group at VU)

DigiBird kickoff meeting

On the 5th of February 2016 the kickoff meeting for the COMMIT/ valorisation project DigiBird took place. The meeting was hosted by the Netherlands Institute for Sound and Vision (Nederlands Instituut voor Beeld en Geluid). During the meeting, the people who will work on the project were introduced, together with the partners involved.

The DigiBird project builds on the results of the SEALINCMedia project, aiming to use crowdsourcing results to integrate three different media types: images, sounds and videos – all related to birds. The various datasets that belong to these different media types are provided by the partners involved in the project. Most of these platforms already use crowdsourcing as a means of annotating the bird media, but there is no single point of access for all of them and no means of crossover access. Thus, the goal of DigiBird is to achieve this integration by creating cross-links between collections and designing user-friendly interfaces. These will not only help to enable access to the various bird collections, but will also motivate people to contribute more knowledge by means of annotations.

The people who will work on developing this project are Chris Dijkshoorn – a PhD student and Cristina-Iulia Bucur – a student assistant, both affiliated with VU University Amsterdam.

The partners involved in DigiBird are:

During the meeting, a hands-on breakout session took place. During this session, the participants from the various partners could create their own view on how the interfaces could look and also how the user interaction can be dealt with by building various scenarios.

Impact Analysis of OCR Quality on Research Tasks in Digital Archives

We presented our paper on “Impact Analysis of OCR Quality on Research Tasks in Digital Archives” at this year’s International Conference on Theory and Practice of Digital Libraries (TPDL2015).

We describe how humanities scholars currently use digital archives and the challenges they face in adapting their research methods compared to using a physical archive. The required shift in research methods has the cost of working with digitally processed historical documents. Therefore, a major concern for the scholars is the question how much trust they can place in analyses based on noisy representations of source texts.

Based on interviews with humanities scholars and a literature study, we classify scholarly research tasks according to their susceptibility to errors originating from OCR-induced biases. Search results for “Amsterdam”, for example, are likely to be influenced by the confusion of the letters “s” and “f”, especially for material that was created before 1800, when the “long s” was still used.
In order to reduce the impact of such errors, we investigated which kind of data would be required for this and whether or not it is available in the archive.

We describe our study of example research tasks performed on the digital newspaper archive of the National Library of The Netherlands. In this study, we tried to reduce the uncertainty of the results as much as possible with the data publicly available in the archive.

We conclude that the current knowledge situation on the scholars’ side as well as on the tool makers’ and data providers’ side is insufficient and needs to be improved.

Birdwatching Rijksmuseum

Rijksmuseum – Naturalis Biodiversity Center – Wikimedia Nederland & COMMIT SealincMedia present a unique birdwatching event

Birds are everywhere. In your own garden, in nature, and also in art. Among the Rijksmuseum’s 1,2 million collection objects are many prints, paintings and artefacts that have bird species depicted on them. Among the 37 million objects in the Naturalis collection are many birds from all over the world that have been collected in the last 200 years, as well as historical drawings of plants and animals in which many birds are depicted.

Wikimedia Commons

Some of the depicted birds are easily identified. Others require a trained eye to determine which species the artist has pictured. The Rijksmuseum and Naturalis are looking for experienced bird watchers and other avian enthusiasts to join an expedition through their digital collections and help the museums identify bird species in works of art.

The Rijksmuseum and Naturalis are currently in the process of donating large parts of their digitized collections of bird images to Wikimedia Commons, Wikipedia’s open multimedia library. Participants of the birdwatching day are challenged to collaboratively identify as many bird species depicted on these images as possible and record these in Wikimedia Commons and in Wikidata, Wikipedia’s open database.

Accurator

For this purpose COMMIT/SealincMedia,  a consortium of Dutch researchers from the VU University Amsterdam, Delft University of Technology and the Centrum Wiskunde & Informatica (centre for mathematics and informatics), has developed a dedicated online tool for the Rijksmuseum. With this tool, called Accurator, common and scientific names of species depicted in artworks can be recorded in an intuitive way. Participants of the birdwatching day will use this tool to tag bird species. Wikipedia volunteers as well as curators from the Rijksmuseum and Naturalis will be present for support throughout the day.

During the birdwatching day the RijksmuseumNaturalis, Wikimedia Netherlands (the organization behind the Dutch version of Wikipedia) and the COMMIT/SealincMedia researchers want to learn how we can best collect your knowledge as a bird enthusiast and apply it to enrich our art collections. We also hope to learn how we can make Accurator more user-friendly.

Can’t wait? Here’s a preview of birds in the Rijksmuseum! https://www.rijksmuseum.nl/nl/zoeken?f.publish.apiCollection=BIRDS
You can start adding information using Accurator here: http://annotate.accurator.nl

Program

10.00 Start
10.10 Introduction and presentation Erik Hinterding, birdwatcher and curator Rijksmuseum
10.30 Presentation Steven van der Mije, head Vertebrate collections Naturalis
10.50 Introduction editing Wikipedia
11.15 Introduction Accurator
11.30 Start edit-a-thin
13.30 Wrap-up by Erik Hinterding
14.00 End

14.30 Tour (optional, registration required)

Registration

Participation is free but due to the limited number of available places registration is required and can be done via https://goo.gl/3rSKNa. For questions, mail us at vogelen@rijksmuseum.nl.

Important

The event will take place in the Cuypers Library of the Rijksmuseum. The following guidelines need to be taken into account:

1. No food, drinks or smoking allowed.
2. Due to the limited number of available places registration in advance is required. On the 4th of October you can report at the desk in the Atrium. Please bring your confirmation of registration. Without this, entry can be denied.
3. If possible, bring your own laptop but there are very strict safety guidelines in the Library. Therefore it is not possible to use your own power supply or adaptor unless your device is not older than a year (please bring proof of purchase). Make sure the battery is fully loaded.

Historical Newspapers as “Big Data”

On Tuesday, March 24, 2015, the National Library of The Netherlands organized a symposium on the use of digitized newspapers in the Digital Humanities. The goal of the symposium was to engage information specialists and end users in a discussion with the KB on future possibilities of using the (data in the) digital newspaper archive.

We presented our research ideas on estimating the impact of OCR errors on research tasks.

For more information about the event, please have a look at the report on the event at the  KB website.

 

Follow

Get every new post delivered to your Inbox.