Home > Informatics >

Data Storage Protocols

Overview

Data storage in the GCE Information System is organized to balance analytical, accessibility, security, and archival considerations for each resource.  Storage specifications for metadata, digital data, and printed data archives are as follows:

Metadata

The metadata describe the physical and logical structure of a data set, as well as the hypotheses, methodology and researchers responsible for its creation.  The primary repository for GCE metadata is the Metabase, a relational database developed using Microsoft SQL ServerŪ 7.0.  The Metabase is secured using both network and database security layers, and is accessed primarily through web applications available on the GCE LTER Public Web Site and GCE LTER Project Web Site.  Limited write access will be provided through data submission forms (project web only), and read access will be provided through submission editing forms (project web) and metadata search and display applications (both web sites).

Database files are synchronized between the GCE data management workstation and server, and regularly backed up to magnetic tape and CD.

Digital Data

The primary repository for digital data (e.g. submissions, processed data, and archived data) is the GCE 25-Apr-2008(110B Marine Sciences Department, University of Georgia).  Data files are protected by several layers of computer security (using Windows 2000 NTFS access control and TCP/IP firewall software), and regularly backed up to additional hard drives, magnetic tape, and CD.  Redundant offsite archives will be established shortly.  Several data formats are currently supported.

As data files become candidates for online access, copies will be transferred to the GCE project server in the Marine Sciences Department at the University of Georgia.  Access to these files will be controlled as appropriate by network file security protocols and web-based data access programs available through the GCE LTER Public Web Site and GCE LTER Project Web Site.

Printed Data

Digital data submissions and processed data are printed on paper and archived in the GCE LTER data management office (110B Marine Sciences Dept., University of Georgia) to provide a written record in case of electronic data loss or corruption.  Access will be controlled by the data manager, using appropriate physical security measures.

Poster presentations, printed reports, news clippings, and other printed matter submitted to the data manager will also be archived in the GCE LTER data management office.

Data File Formats

MATLAB Files (*.mat)

MATLAB binary files are the primary storage format for GCE tabular data sets.  Data are organized as two types of structure variables: data structures (named "data") and stat structures (named "stats_all" or "stats_unflagged").  These variable types are described below:

GCE Data Structures

GCE Data Structures are multidimensional MATLABŪ 5.x structure variables designed to store fully-documented tabular data sets (specifications).  MATLAB functions in the GCE Data Toolbox provide a layer of abstraction, allowing users to work with information in data structures without requiring direct manipulation of the structure itself.  Toolbox functions also programmatically preserve row correspondence between data columns, transfer metadata content when creating new structures, and transparently store function processing history information.  This allows users to manipulate data structures without compromising their information quality.

GCE Stat Structures

GCE Stat Structures are multidimensional MATLABŪ 5.x structure variables designed to store statistical summary information for a single GCE Data Structure (specifications).  This information will be used to summarize data sets and provide authentication information for data documentation.

Column statistics can be performed on all or only unflagged observations, either ungrouped or grouped by the values in one key column.  Appropriate statistics are calculated according the the physical and logical data types of each column.

Text Files (*.txt, *.doc)

Text data files are secondary files generated from MATLAB data files to provide an open standards-based archive of the data and documentation.  Data are stored as tab-delimited columns of numbers and text, formatted according to information in the MATLAB structures.  Column descriptions, metadata, and summary statistics are provided in separate non-delimited documentation files (*.doc).

   Data Access GCE Information System (contents) 
LTER
NSF

This material is based upon work supported by the National Science Foundation under grant numbers OCE-9982133 and OCE-0620959.  Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.