First draft – Data quality checklist for dataviz projects

First draft of what could have been a very cool checklist for dataviz efforts where the data is also made reusable.

Structure and format

To be reusable, data needs to be structured consistently and to follow pre-existing standards, vocabularies, and formats.

The basics

  • The data and metadata are provided in a non-proprietary, structured format.
  • The data structure is defined.
  • The metadata structure is defined.

Use of standards

  • All labels in the dataset follow the same naming convention.
  • The naming convention is made available along with the dataset.
  • The vocabularies used in the dataset are consistent.
  • Use of abbreviations in the dataset is consistent.
  • References to other datasets or standards are identified (eg. ISO 3166 2-character country codes for countries, IANA TZ for timezones).


  • Units of measurement are defined for all fields.
  • Date and time formats are defined.


Good metadata will facilitate reuse of the dataset, providing the necessary information to identify provenance of the data but also providing context.

The basics

  • Metadata is provided with the dataset
  • Metadata is up to date

For the dataset, are documented

  • Title
  • Description
  • Unique identifier
  • Category
  • Usage license
  • Conditions of attribution, reuse, redistribution and commercialisation
  • Creation date
  • Last update date
  • Publication date
  • Author
  • Publisher
  • Mean of contact for the author or publisher
  • Data transformations
  • Data sources
  • Changelog
  • Update frequency

For geospatial datasets, are documented

  • Coverage area
  • Coordinate system
  • Map projection



  • There are no duplicate records in the dataset.
  • Missing records are accounted for, to decide whether to surface or ignore them.
  • Record structure is the same for all records.
  • Units of measurement are the same for all records.
  • Missing values are accounted for, to decide whether to surface or ignore them.