# README
eprinttools - README.html
© 2021 Caltech Library
1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200
Email Us
Phone: (626)395-3405
/
});
li.L0, li.L1, li.L2, li.L3, li.L4, li.L5, li.L6, li.L7, li.L8, li.L9
{
color: #555;
list-style-type: decimal;
}

eprinttools
This is a collection of command line tools and a web service written in Go for working with EPrints 3.3.x EPrint XML, the EPrint REST API and directly with the EPrints MySQL repository database(s). It is used by Caltech Library to render our https://feeds.library.caltech.edu website as well as for migrating content into a new repository system. Some of the command line tools maybe of more generatl interest while others are specific to Caltech Library’s needs. Much of the test code presumes access to our repositories so is specific to our needs.
Go base code
The programs:
- eputil is a command line utility for
interacting (e.g. harvesting) JSON and XML from EPrints’ REST API
- minimal configuration (because it does so much less!)
- epfmt is a command line utility to pretty print EPrints XML and convert to/from JSON including a simplified JSON inspired by DataCite and Invenion 3
- doi2eprintxml is a command line program for turning metadata harvested from CrossRef and DataCite into an EPrint XML document based on one or more supplied DOI
- ep3apid is a Unix style web service for interacting with an EPrint repository via a localhost proxy. It includes the ability to get restricted key lists as well as retrieve a simplified JSON record representing an EPrints record
- ep3harvester is an EPrints 3.x metadata harvesting tool working at the MySQL 8 level for EPrints content. It harvests the contents into a MySQL 8 database, one table per eprints repository storing the harvested metadata in JSON columns. This tool can also harvest CSV files with information for people and groups referenced in the EPrints repositories.
- ep3genfeeds is used to genate the JSON documents that drive our feeds website.
- ep3datasets is a tool to generate dataset collections from previously harvested EPrints repositories
Use cases
Two primary use cases have driven development of EPrinttools
- Reusing the metadata and content in our EPrints 3.3.16 repositories (see Caltech Library Feeds
- Populating our EPrints repository from standardize data sources (see Acacia Project).
Related GitHub projects
- py_dataset, This Python module provides access to dataset collections which we use as intermediate storage for JSON documents and related attachments.
- AMES, The eprintools command line programs have been made available to Python via the AMES project. This include support for both read and write to EPrints repository systems.
# Functions
CloseConnections.
CloseJSONStore.
CrossRefWorksToEPrint takes a works object from the CrossRef API and maps the fields into an EPrint struct return a new struct or error.
No description provided by the author
DataCiteWorksToEPrint takes a works object from the DataCite API and maps the fields into an EPrint struct return a new struct or error.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
FmtHelp lets you process a text block with simple curly brace markup.
GenerateGroupFeed returns a JSON document containing an array of group keys.
GenerateIDNumber generates a unique ID number based on the instance of generation and the default collection name.
GenerateImportID generates a unique ID number based on the instance of generation and the default collection name.
GenerateOfficialURL generates an OfficalURL (i.e.
GeneratePeopleFeed returns a JSON document contiainer an array of people ids.
GetAllEPrintIDs return a list of all eprint ids in repository or error.
GetAllEPrintIDsWithStatus return a list of all eprint ids in a repository with a given status or return error.
GetAllItems returns a list of simple items (e.g.
GetAllORCIDs return a list of all ORCID in repository.
GetAllPersonNames return a list of person names in repository.
GetAllPersonOrOrgIDs return a list of creator ids or error.
GetAllUniqueID return a list of unique id values in repository.
GetAllYears returns the publication years found in a repository.
GetDocumentAsEPrint trake a configuration, repoName, eprint if and returns an EPrint struct or error based on the contents in the json store.
GetEPrint fetches a single EPrint record via the EPrint REST API.
GetEPrintIDsForDateType returns list of eprints in date range or returns an error.
GetEPrintIDsForItem.
GetEPrintIDsForORCID return a list of eprint ids associated with the ORCID.
GetEPrintIDsForPersonName return a list of eprint id for a person's name (family, given).
GetEPrintIDForPersonOrOrgID return a list of eprint ids associated with the person or organization id.
GetEPrintIDsForUniqueID return list of eprints for DOI.
GetEPrintsIDsForYear returns a list of published eprint IDs for a given year.
GetEPrintIDsInTimestampRange return a list of EPrintIDs in created timestamp range or return error.
GetEPrintIDsWithStatus returns a list of eprints in a timestmap range for a given status or returns an error.
GetEPrintIDsWithStatusForDateType returns list of eprints in date range for a given status or returns an error.
GetEPrintIDsWithStatusInTimestampRange return a list of EPrintIDs with eprint_status in field timestamp range or return error.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
GetJSONDocument takes a configuration, repoName, eprint id and returns the JSON source document.
GetJSONRow takes a configuration, repoName, eprint id and returns the table row as JSON source.
GetKeys returns a list of eprint record ids from the EPrints REST API.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
GetUserBy takes a field name (e.g.
GetUserID takes a username and returns a list of userid.
GetUsernames returns a list of all usernames in a repository.
HarvesterDBSchema returns SQL statements for creating the tables and database for the harvester based on the initialization file provides.
ImportEPrints take an repository id, eprints structure.
IsPublic takes an EPrintID and returns true if public, false otherwise
Check if an EPrint record "is public".
LoadConfig reads a JSON file and returns a Config structure or error.
NewEPrints returns a *EPrint with the name space set.
OpenConnections.
OpenJSONStore.
No description provided by the author
No description provided by the author
RunDataset will use the eprinttools settings.jons config file and a repository ID (e.g.
RunDatasets will use the eprinttools settings.jons config file and reader dataset collections based on the contents.
No description provided by the author
No description provided by the author
RunGenfeeds will use the config file names by cfgName and render all the directorys, JSON documents and non-templated markdown content needed for a feeds v1.1 website in the htdocs directory indicated in the configuration file.
RunGenGroups will use the config file names by cfgName and render all the directorys, JSON documents and non-templated markdown content needed for a feeds v1.1 website in the htdocs directory indicated in the configuration file.
RunGenPeople will use the config file names by cfgName and render all the directorys, JSON documents and non-templated markdown content needed for a feeds v1.1 website in the htdocs directory indicated in the configuration file.
RunHarvester will use the config file names by cfgName and the start and end time strings if set to retrieve all eprint records created or modified during that time sequence.
RunHarvestGroups loads CSV files containing people and group crosswalk tables.
RunHarvestPeople loads CSV files containing people and group crosswalk tables.
RunHarvestRepoID will use the config file names by cfgName and the repository id, the start and end time strings if set to retrieve all eprint records created or modified during that time sequence for that repository.
No description provided by the author
SaveJSONDocument takes a configuration, repoName, eprint id as integer and JSON source saving it to the appropriate JSON table.
No description provided by the author
SetDefaults sets the default values for DefaultCollection, DefaultRights, Default.
SQLCreateEPrint will read a EPrint structure and generate SQL INSERT, REPLACE and DELETE statements suitable for creating a new EPrint record in the repository.
No description provided by the author
SQLReadEPrint expects a repository map and EPrint ID and will generate a series of SELECT statements populating a new EPrint struct or return an error (e.g.
No description provided by the author
No description provided by the author
# Constants
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
ReleaseDate, the date version.go was generated.
ReleaseHash, the Git hash when version.go was generated.
Version number of release.
# Variables
DefaultCollection holds the default collection to use on deposit.
DefaultOfficialURL holds a URL prefix for the persistent URL, an ID Number will get added when generating per record official_url values.
DefaultRefereed.
DefaultRights sets the eprint.Rights to a default value on deposit.
DefaultStatus.
No description provided by the author
# Structs
AccompanimentItemList.
AltTitleItemList.
ConductorItemList.
ConfCreatorItemList.
Config holds a configuration file structure used by EPrints Extended API Configuration file is expected to be in JSON format.
ContributorItemList.
CopyrightHolderItemList.
CorpContributorItemList (not used in EPrints, but used in Invenio).
CorpCreatorItemList.
CreatorItemList holds a list of authors.
DataSource can contain one or more types of datasources.
DivisionItemList.
Doc holds the data structure of our jsonstore row.
Document structures inside a Record (i.e.
EditorItemList holds a list of editors.
No description provided by the author
EPrint is the record contated in a EPrints XML document such as they used to store revisions.
EPrints is the high level XML you get from the REST API.
EPrintsDataSet is a struct for parsing the HTML page that returns a list of available EPrint IDs with links.
EPrintUser is a struct for representing a user in a EPrint repository.
ExhibitorItemList.
File structures in Document.
FunderItemList.
Group holds the data structure presenting the group information and the crossswalk IDs maintained in groups.csv.
GScholarItemList.
Item is a generic type used by various fields (e.g.
ItemIssueItemList.
KeywordItemList.
LearningLevelItemList.
LocalGroupItemList holds the related URLs (e.g.
LyricistItemList.
Name handles the "name" types found in Items.
OptionMajorItemList.
OptionMinorItemList.
OtherNumberingSystemItemList.
PatentAssigneeItemList.
PatentClassificationItemList.
Person holds the data structure representing the general person information and the crosswalk IDs maintained in the people.csv file.
ProducerItemList.
ProjectItemList.
ReferenceItemList.
ReferenceTextItemList.
RelatedPatentItemList.
RelatedURLItemList holds the related URLs (e.g.
RelationItemList is an array of pointers to Item structs.
ShelfItemList.
SkillAreaItemList.
SubjectItemList.
ThesisAdvisorItemList.
ThesisCommitteeItemList.
# Interfaces
ItemsInterface describes a common set of operations on an item list.
# Type aliases
DocumentList is an array of pointers to Document structs.