Instructions

Requirements:

  • All samples for which chemical concentration data is being reported must also be recorded in a separate Sample Metadata file and the Sample_Names must be identical for each corresponding sample.

  • Data must be saved as csv files, even if the excel-based templates are used (i.e. all files must be converted to csv files prior to uploading to ESS-DIVE) - it is, however, permissible to add excel (or other proprietary software) files as long as the same data is also provided in csv format.

  • Compliance with this reporting format will automatically ensure compliance with upstream reporting formats (i.e. package, file, csv, and sample level metadata), unless there are additional types of data included in the same dataset (in which case the corresponding reporting formats should be followed for those data types). However, if the provided templates are modified (or partially/completely disregarded) it will be necessary to double-check compliance with any upstream reporting formats (as well as the specific requirements for this reporting format) to ensure your data and metadata files are adhering to the minimal requirements of each level.

To create a dataset containing chemical concentration data for water/soil/sediment samples follow these steps:

  1. Decide how you want to organize your data - for example: one or more data files?, samples in rows or columns?, one or more measured variables in one data file – if more than one, in rows or columns?, method metadata details in the same or a separate file? We strongly recommend using seperate files for the data and the detailed method metadata, as it will simplify reuse of your method metadata across datasets. It also makes the data table easier to overview.

  2. Create your sample metadata file(s). Make sure all of your samples have a unique name and assemble the sample metadata table, following the instructions for sample metadata.

  3. Assemble the data table(s), making sure that all of the required fields are included and formatted correctly (see example) and that the Sample_Names and Material fields match the ones in the sample metadata file. NO empty rows or cells are allowed; use “-9999” for missing values in numerical fields and “N/A” in text fields - this includes header columns and rows. Using the data_template is strongly recommended. Note that header rows can be turned into columns and vice versa if that is more suitable for your data (see example). It is also permitted to record the required metadata for each measured variable in the data dictionary (_dd.csv) file as long as all required fields are included.

  4. Assemble the methods file (methods_template), making sure all the required fields are included and formatted correctly. All method descriptions must be detailed enough for someone else to repeat the procedure. It is recommended to use separate columns for details that are of relevance for making quick assessments about sample integrity, method applicability, and data quality (e.g., temperature, light, atmosphere conditions). See example.

  5. (Optional) Prepare terminology file (template). If you are using data flags or other codes (that are not Field_Names, methodIDs, or Sample_Names included with corresponding required metadata in data dictionary, method, or sample metadata file(s)) you must provide explanations for those in a separate terminology file (see example) or add them to the data dictionary (including a column for the required “Term_Type”) (see example). We recommend assembling all terms and codes that are used within your project in one “master” terminology file, which can then be included in its entirety in all datasets or used to create a dataset specific terminology file with a subset of the relevant terms for that dataset. This will help maintain consistent terminology within the project (e.g., between team members or subprojects) and will minimize time spent on generating terminology and data dictionary files for future dataset submissions.

  6. Save all files to be included in the dataset in csv format with the appropriate extension (i.e., _data.csv; _methods.csv; _terminology.csv; _dd.csv). Filenames must be unique and should be as descriptive as possible about the file contents (e.g., 2018_SlateRiver_soil_data.csv; WHONDRS_methods.csv; WatershedFunctionSFA_terminology.csv). Use only letters (e.g., camelCase), numbers, and underscores "_". Do not include spaces. Hyphens are allowed but not preferred. No other special characters are allowed in file names.

  7. List all terms (Column_or_Row_Name) in data dictionary table(s) (template), providing the required metadata for csv files (see example). Note that even if units are included in the data file (recommended), it is still required to specify the unit for each variable in the data dictionary. You may create a _dd.csv file for each individual file, or use the wildcard "*" option and generate one _dd.csv for multiple (or all) files in the dataset (e.g., if the same data dictionary is used for multiple groundwater sample concentration data files you may use “groundwater_chem_*_dd.csv”). It is recommended to keep methods and terminology data dictionaries separate from each other and from those for data files, as the structures are different between those types of files. However, if you follow the recommendation of generating one master methods file and one master terminology file, you should be able to re-use the same data dictionary for each of those for future datasets (unless there are terminology or methodology changes for your project). You should also be able to assemble the data dictionaries from the master terminology file by copying and pasting the relevant terms for the specific files. The data dictionary is a requirement for all csv files according to the file level metadata reporting format and is needed to make translation and combination of data from various projects and packages possible for others and to facilitate the automatic extraction of file level metadata (i.e., saving time and effort for you).

  8. Assemble the file level metadata (FLMD) table for all files in your dataset (template). Note that all files included in a dataset must be listed in the FLMD table.

Figure 1. Example of FLMD table for a dataset with two data files from SLAC SFA 2018 field campaign at Slate River, CO. Top: Columns A-H, Bottom: Columns I-Q.

Last updated