An overview of the transcription process

The aim of this project is to extract structured data from digital copies of identification documents used in the administration of the White Australia Policy. To do this we’re creating a website using the open source Scribe Framework.

Scribe is designed to work with structured data — the sort you find in forms and certificates. It breaks the transcription process down into three main tasks:

  • Marking
  • Transcribing
  • Verifying

Marking identifies the fields that need to be transcribed. All you do is choose the field from a list and then draw a box around the corresponding value in the document. You might also be asked to mark things like photos, handprints, and Chinese characters.

Marking fields

Transcribing is fast and fun! All you do is type what you see in the highlighted area of the document. The handwriting can be tricky at times, but you don’t need to worry too much as each field is transcribed multiple times for accuracy. Just try your best!

Transcribing marked fields

If transcribers disagree about the contents of a field, it’s sent to an additional verification stage, where you vote on the most accurate transcription. Once a broad consensus is achieved the field is tagged as complete.

Verifying transcribed data
Almost ready to go!

The transcription site is almost ready. We’ll be posting a link here on Saturday, 9 September. You can also follow @invisibleaus and like our Facebook page for updates.

The site will be open to all willing volunteers — wherever you are around the world you can help us document and understand the lives of people living under the restrictions of the White Australia Policy.

If you’re in Canberra on 9-10 September, bring your laptop along to the Museum of Australian Democracy at Old Parliament House where we’ll be running a transcribe-a-thon. How many documents can we transcribe in a weekend?

Leave a Reply

Your email address will not be published. Required fields are marked *