Proposed format for data and readme.txt file associated to uploaded data.
UPDATE: the readme.txt file will be generated automatically from an upload form that the submitter will fill out online.
This version is based on discussions at the first workshop and comments made on the first two days of the second workshop.
A. Directory structure
- Each upload consists of a main directory containing one file called readme.txt (all lower case) and one or more directories whose names come from the following list: data, papers, description, code, figures, autogen
B. Uploading
Uploading will only take place once the file is approved by one of the project managers.
- Users (registered) should provide a link to a directory so that the managers can view the directory structure and browse individual files.
- Users will also make a single compressed tar file and provide a link to the tar file so that, if accepted, the tar file can be retrieved.
- The upload process will involve filling out a form which specifies:
- What kind of object is being uploaded (modular form, L-function zeros, etc. The idea is to specify enough information that someone interested in the data can navigate to it.)
- Relationship to existing data (does it correct or extend existing data?)
- The system will record the entered data as well as the name of the person uploading (which is known because you have to log in) and a timestamp.
A confirmation page will repeat the cataloging information, report checksum, etc, to be checked before final upload.
C. The readme.txt file will be generated automatically based on an upload form that the contributor will fill out online.
- The readme.txt must be human readable and also sufficiently structured that a script can scan for the most important metadata. Thus:
1. All fields in the readme.txt must be labeled (allowable labels and format of labeling to be decided).
2. The following fields are required (and the system will reject uploads that omit one of these fields).
- Title, who created the data, how it was created, when it was created, description of the data, file list
3. The following fields will be filled out automatically (or take on their default values) if omitted.
- Licensing
4. The following information is strongly encouraged but not required
- Operating system on which the data was created, run time to create the data, description of file contents, rigor (expected number of correct digits and how the data was checked), official name of the object (eg, Selberg data), how data is encoded (the structure of the data, needed to read into a database)
5. The following information can be included if appropriate
- Research paper which gave rise to this data
6. The readme.txt file that users see when the browse for data will NOT be identical to the uploaded file. The system will add certain information, such as
- Timestamp, Cite this data as, Checksum (may be tricky if one upload has several data files, such as Dirichlet coefficients and zeros)
Citations
One of the trickier questions is how people should cite data in publications. We want people to be able to refer to the actual data, but we also would like people to reference the research paper that gives rise to the data (if such a paper exists). Here is a proposed example.
In the uploaded readme.txt file I put
Relevant research paper: D. Farmer and S Lemurell, Deformations of Maass forms, Math Comp 74 (2005), no. 252, 1967--1982
In the public readme.txt the following will appear
Cite this data as: D. Farmer and S Lemurell, Deformations of Maass forms, Math Comp 74 (2005), no. 252, 1967--1982. Data available at http://l-functions.org/Data/Maassforms/GL2/
Note that the "Data available at" citation does not give the full path to the data. It uses the information from the upload form to get you close enough.
