Introduction to OpenRefine

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • What is OpenRefine? What can it do?

Objectives
  • Explain what the OpenRefine software does

  • Explain how the OpenRefine software can help work with data files

  • Introduction Slides

What is OpenRefine?

What is OpenRefine?

It can help you:

Some common scenarios might be:

Data you have Desired data
1st January 2014 2014-01-01
01/01/2014 2014-01-01
Jan 1 2014 2014-01-01
2014-01-01 2014-01-01
Data you have Desired data
London London
London] London
London,] London
london London
Address in single field Institution Library name Address 1 Address 2 Town/City Region Country Postcode
University of Wales, Llyfrgell Thomas Parry Library, Llanbadarn Fawr, ABERYSTWYTH, Ceredigion, SY23 3AS, United Kingdom University of Wales Llyfrgell Thomas Parry Library Llanbadarn Fawr   Aberystwyth Ceredigion United Kingdom SY23 3AS
University of Aberdeen, Queen Mother Library, Meston Walk, ABERDEEN, AB24 3UE, United Kingdom University of Abderdeen Queen Mother Library Meston Walk   Aberdeen   United Kingdom AB24 3UE
University of Birmingham, Barnes Library, Medical School, Edgbaston, BIRMINGHAM, West Midlands, B15 2TT, United Kingdom University of Birmingham Barnes Library Medical School Edgbaston Birmingham West Midlands United Kingdom B15 2TT
University of Warwick, Library, Gibbett Hill Road, COVENTRY, CV4 7AL, United Kingdom University of Warwick Library Gibbett Hill Road   Coventry   United Kingdom CV4 7AL
Data you have Date of Birth from VIAF (Virtual International Authority File) Date of Death from VIAF (Virtual International Authority File)
Braddon, M. E. (Mary Elizabeth) 1835 1915
Rossetti, William Michael 1829 1919
Prest, Thomas Peckett 1810 1879

Basics of OpenRefine

You can find out a lot more about OpenRefine at http://openrefine.org and check out some great introductory videos. There is a Google Group that can answer a lot of beginner questions and problems. OpenRefine recipes, scripts, projects, and extensions are available too, where you can find and copy them into your OpenRefine instance to run on your dataset.

The OpenRefine GitHub wiki page has a reference of the General Refine Expression Language (GREL).

Features

Key Points

  • OpenRefine is ‘a tool for working with messy data’

  • OpenRefine works best with data in a simple tabular format

  • OpenRefine can help you split data up into more granular parts

  • OpenRefine can help you match local data up to other data sets

  • OpenRefine can help you enhance a data set with data from other sources