Data publishing

TERN’s national ecosystem data infrastructure is helping the Australian ecosystem science community to make the most of their collective data resources, by enabling broad participation of researchers in data publication and providing access to a wealth of ecosystem data. Our philosophy is ‘collect data once – make it discoverable – re-use it many times’, and to support this we’ve planned and built the national ecosystem data infrastructure to support all stages of the ecosystem data and research cycle. In essence, TERN can now offer a one-stop-shop solution for data storage, data publishing and citation through minting of digital object identifiers (DOIs), licensing and discoverability. Building the infrastructure, tools, and culture to support data publication is a vital element in TERN’s vision of a collaborative, networked ecosystem science community.

You can click here to read more about data publishing, and the benefits it offers for ecosystem science and management. And to see how TERN’s infrastructure can be used to support data publishing – and has already been used to successfully published a wide range of nationally significant ecosystem datasets – click here.

 

Data publishing continuum

A diversity of approaches to data publishing are used throughout the ecosystem science community. TERN’s infrastructure – national, coordinated, networked – has been designed to support the diverse needs of the entire ecosystem science community, and ensure they receive maximum benefits from participating in data publication. The continuum below demonstrates some of the benefits that can arise from taking advantage of infrastructure such as TERN to publish data.

 

 

Approaches to Data Publication

Characteristics of approach used

Individual

(e.g. individual researcher – data stored internally, sometimes shared by individual agreement with collaborators.)

Research Group

(e.g. research team – data stored internally, shared amongst group, and with some external collaborators by individual agreement.)

National data collection, storage and discovery infrastructure

(e.g. Data are appropriately described with their contextual information and published in a standard format. Datasets are discoverable from national infrastructure, e.g. TERN.)

Global data storage and discovery

(e.g. data are published in a standard, machine-readable (e.g. ASCII), publicly accessible open access format and platform.)

Attributes required for data description, discovery and re-use

Very limited, on ad-hoc basis.

Limited, on ad-hoc basis.

Comprehensive, done using consistent, standard formats and codes.

General, done using consistent, standard formats and codes.

Click for more info

Data format

Differing file formats – whatever is convenient for researcher.

Differing file formats – whatever is convenient for researcher and research groups.

Consistent, standard data formats and codes to represent data.

Consistent, standard format (one type, machine readable).

Metadata / information on context of data

Little to none, minimal support for data discovery.

Little - just enough to understand data by the Group members.

Consistent, rich metadata published alongside data enabling substantial improvement in data half-life period.

Accurate description of data in machine readable format.

Storage

Local hard disk may not be accessible by others.

Local network.

Network of distributed repositories.

 

Discoverability

Not possible as data are not made discoverable and have only limited metadata.

Discovery may be possible but most likely the access is restricted.

Data can be discovered from repositories due to appropriate contextual information.

Data is discoverable in a network of data services.

Accessibility

Unable to share data without significant personal interactions.

Access to limited set of people with various restrictions.

Access is maintained by licence and mostly openly accessible, access to data will be via web services.

Data can be queried and accessible via machine-to-machine interface.

Ability to use and measure data as a research output

Very limited.

Limited.

High.

Moderate.

Click for more info

Data formally recognised as research output.

Not possible due to limited description and accessibility.

Not possible without making data accessible with appropriate metadata.

Yes: A persistent identifier (e.g. DOI) is attached to data, enabling identification of data as output. Data is discoverable via searchable portals, and access is provided under appropriate licensing. The persistent identifier makes data citable and enables ability to track citation and re-use.

Yes: A persistent identifier is attached to dataset, enabling identification of data as output. This also enables data citation and provides the ability to track citation and re-use.

Opportunities to collaborate

Very limited: Only possible via person to person interaction.

Limited: Only possible via person/group to person/group interaction.

High: Possible via ‘portal to person’ interaction, not only person to person.

Moderate: Automated access to data via machine-to-machine interface.

Ability to contribute to multi- and inter-disciplinary sciences

Very limited: Less opportunities for others to be aware of data and research outputs, hence less opportunity for uptake / re-use of data by others.

Limited: Less opportunities for others to be aware of data and research outputs, hence less opportunity for uptake / re-use of data by others.

High: Data discoverable and accessible by wide audiences. Availability of rich metadata gives the context of data to enable appropriate re-use.

Moderate: Data discoverable and accessible by wide audiences.

Quality assurance

Very limited: Unable to verify accuracy and quality of data without human interaction. Accuracy and quality not formally documented.

Limited: Unable to verify accuracy and quality of data unless it is documented.

High: Governance structure for better data curation and quality check with appropriate metadata to reflect the quality of data.

Moderate: The description reflects the quality of data.

Capacity for data re-use, and risks for data creators and users

Very limited capacity for re-use, high risks to both data creator and potential data users.

Limited capacity for re-use outside research group, high risks to both data creator and potential data users.

Very high capacity for re-use, low risks to data creator and potential data users.

Moderate capacity for informed re-use, moderate risks to data creator and potential data users.

Click for more info

Risks to data creator for data storing and sharing

 

Data not systematically catalogued, safely stored, nor discoverable – risk loss of data. Unable to publish data or data publication is on a case-by-case basis requiring high effort. Unable to share data without significant personal interactions.

 

 

Unable to publish data or data publication is on a case-by-case basis requiring high effort. Unable to share outside own community without significant personal interactions. Data may be ‘mis-used’ by data re-users due to limited availability of metadata and contextual information.

 

 

Data may be ‘mis-used’ by data re-users. This risk is ameliorated by the provision of rich metadata and contextual information with datasets. Published data are used prior to data creator using the data. Uncertainty in long-term funding to manage and maintain the infrastructure.

 

 

Data may be ‘mis-used’ due to limited interaction with metadata and contextual information.

 

Risks to data user

 

Very limited access to data and metadata. Lack of metadata and contextual information increases possibility of data ‘mis-use’ or misinterpretation.

 

 

Very limited access to data and metadata. Lack of metadata and contextual information increases possibility of data ‘mis-use’ or misinterpretation.

 

 

 

Data are published for machine-to-machine access. Increased possibility of data ‘mis-use’ due to limited interaction with metadata and contextual information.