The aim of this project is to extract structured data from digital copies of identification documents used in the administration of the White Australia Policy. To do this we’re creating a website using the open source Scribe Framework.
Scribe is designed to work with structured data — the sort you find in forms and certificates. It breaks the transcription process down into three main tasks:
Marking identifies the fields that need to be transcribed. All you do is choose the field from a list and then draw a box around the corresponding value in the document. You might also be asked to mark things like photos, handprints, and Chinese characters.
Transcribing is fast and fun! All you do is type what you see in the highlighted area of the document. The handwriting can be tricky at times, but you don’t need to worry too much as each field is transcribed multiple times for accuracy. Just try your best!
If transcribers disagree about the contents of a field, it’s sent to an additional verification stage, where you vote on the most accurate transcription. Once a broad consensus is achieved the field is tagged as complete.
Almost ready to go!
The site will be open to all willing volunteers — wherever you are around the world you can help us document and understand the lives of people living under the restrictions of the White Australia Policy.
If you’re in Canberra on 9-10 September, bring your laptop along to the Museum of Australian Democracy at Old Parliament House where we’ll be running a transcribe-a-thon. How many documents can we transcribe in a weekend?