Categorygithub.com/caltechlibrary/irdmtools
modulepackage
0.0.89
Repository: https://github.com/caltechlibrary/irdmtools.git
Documentation: pkg.go.dev

# README

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

Institutional Repository Data Management Tools

This is a proof of concept set tools for working with Invenio RDM and migrating content from EPrints to RDM. It consists of a small set of Go based command line programs along with Python scripts and a wrapping irdm Python module. The Go based tooling is designed to work directory with a copy of you repositories' database (e.g. Postgres for RDM or MySQL for EPrints).

The proof of concept is being developed around RDM's web services (e.g. REST API and OAI-PMH), PostgreSQL database and external metadata services (e.g. CrossRef, DataCite).

Caltech Library is using irdmtools to migrate content from our legacy EPrints 3.3 repositories (heavily customized) to RDM. Post migration the core Go tools will remain useful for curation at the collection level (e.g. rdmutil)

Featured Tools

rdmutil

This tool is for interacting with an Invenio RDM repository via RDM's REST and OAI-PMH API. It covers most the JSON API documented at https://inveniordm.docs.cern.ch/. This includes listing, submitting and managing records and draft records.

rdmutil configuration is read either from the environment or a JSON formatted configuration file. See the man page for details.

ep3util

This tool is used for migrating data out of EPrints. It can be used on a copy of your EPrints MySQL database. It parallels rdmutil and is an evolution of our tooling developed in eprinttools. See the man page for details.

eprint2rdm

This tool is migrating content from an EPrints repository via the EPrint REST API. It will retrieve an EPrint XML representation of the EPrint record and transform it into a JSON encoded simplified record nearly compatible with Invenio RDM. See the man page for details.

doi2rdm

This tool will query the CrossRef or DataCite API and convert a works record into a JSON structure compatible with an RDM record (e.g. to be inserted via an RDM API call). See the man page for details

ep3ds2citations

This tools take an EPrint record in a dataset collection and returns an abbreviated record inspired by citeproc. It also supports harvesting selected EPrint records into a dataset collection using the -harvest and -ids options. We use this feature to facilate creating https://feeds.library.caltech.edu. See the man page for details.

rdmds2citations

This tools take an RDM record in a dataset collection and returns an abbreviated record inspired by citeproc. It also supports harvesting selected RDM records into a dataset collection using the -harvest and -ids options. We use this feature to facilate creating https://feeds.library.caltech.edu. See the man page for details.

Requirements

  • An Invenio RDM deployment
  • To building the Go based software and documentation
    • git
    • Go >= 1.22.1
    • Make (e.g. GNU Make)
    • Pandoc >= 3
  • For harvesting content
  • To migrate content from EPrints 3.3 to RDM
    • Python 3 and packages listed in [requirements.txt]

Quick install

If you're running on Linux, macOS or Raspberry Pi OS you may be able to installed precompiled irdmtools Go based tools with the following curl command --

curl https://caltechlibrary.github.io/irdmtools/installer.sh | sh

Installation from source

This codebase is speculative. It is likely to change and as issues are identified. To install you need to download the source code and compile it. Here's the steps I take to install irdmtools.

git clone [email protected]:caltechlibrary/irdmtools
cd irdmtools
make
make test
make install
python -m pip install -r requirements.txt

Configuration

The Go based tools rely on a properly configured environment (i.e. environment variables set in your POSIX shell). Specific requirements are listed in the man pages for each of the Go based command line programs.

# Packages

No description provided by the author

# Functions

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
CheckDOI takes a DOI and does a lookup to see if there are any matching .pids.doi.indentifier values.
CheckWaitInterval checks to see if an interval of time has been met or exceeded.
CrosswalkCreatorToCitationAgent takes a simplified.Cretor and returns a CitationAgent, role (e.g.
CrosswalkCrossRefWork takes a Works object from the CrossRef API and maps the fields into an simplified Record struct return a new struct or error.
CrosswalkDataCiteObject takes a Object object from the DataCite API and maps the fields into an simplified Record struct return a new struct or error.
CrosswalkEPrintToRecord implements a crosswalk between an EPrint 3.x EPrint XML record as struct to a Invenio RDM record as struct.
CrosswalkPersonOrOrgToCitationAgent takes a simplified.PersonOrOrg and returns a CitationAgent.
CrosswalkRdmToEPrint takes a public RDM record and converts it to an EPrint struct which can be rendered as JSON or XML.
DeleteEndpoint takes an access token and endpoint path along with JSON source as payload and returns JSON source and error value.
DeleteFiles takes a configuration object and record id, and list of files and removes from a draft.
DiscardDraft takes a configuration object and record id, contacts an RDM instance and deletes a draft of a record and an error value.
DoiPrefix takes a DOI returns the publisher prefix.
EPrintToCitation takes a single EPrint records and returns a single Citation struct.
FmtHelp lets you process a text block with simple curly brace markup.
GetAccess takes an acces token, a record id and optionally a access type.
GetAllEPrintIDs return a list of all eprint ids in repository or error.
GetAllEPrintIDsWithStatus return a list of all eprint ids in a repository with a given status or return error.
GetAllItems returns a list of simple items (e.g.
GetAllORCIDs return a list of all ORCID in repository.
GetAllPersonNames return a list of person names in repository.
GetAllPersonOrOrgIDs return a list of creator ids or error.
GetAllUniqueID return a list of unique id values in repository.
GetAllYears returns the publication years found in a repository.
GetDraft takes a configuration object and record id, contacts an RDM instance retrieves an existing draft of a record and an error value.
GetDraftFiles takes a configuration object and record id, contacts an RDM instance and returns the files metadata and an error value.
GetEndpoint takes an access token and endpoint path and returns JSON source and error value.
GetEPrint fetches a single EPrint record via the EPrint REST API or MySQL database if configured.
GetEPrintIDsForDateType returns list of eprints in date range or returns an error.
GetEPrintIDsForItem.
GetEPrintIDsForORCID return a list of eprint ids associated with the ORCID.
GetEPrintIDsForPersonName return a list of eprint id for a person's name (family, given).
GetEPrintIDForPersonOrOrgID return a list of eprint ids associated with the person or organization id.
GetEPrintIDsForUniqueID return list of eprints for DOI.
GetEPrintsIDsForYear returns a list of published eprint IDs for a given year.
GetEPrintIDsInTimestampRange return a list of EPrintIDs in created timestamp range or return error.
GetEPrintIDsWithStatus returns a list of eprints in a timestmap range for a given status or returns an error.
GetEPrintIDsWithStatusForDateType returns list of eprints in date range for a given status or returns an error.
GetEPrintIDsWithStatusInTimestampRange return a list of EPrintIDs with eprint_status in field timestamp range or return error.
GetFile takes a configuration object, record id and filename, contacts an RDM instance and returns the specific file metadata and an error value.
GetFiles takes a configuration object and record id, contacts an RDM instance and returns the files metadata and an error value.
GetKeys returns a list of eprint record ids from the EPrints REST API.
GetModifiedKeys returns a list of eprint record ids from the EPrints MySQL database.
GetModifiedRecordIds takes a configuration object, contacts am RDM instance and returns a list of ids created, deleted or updated in the time range specififed.
GetRawRecord takes a configuration object and record id, contacts an RDM instance and returns a map[string]interface{} record ``` cfg, _ := LoadConfig("config.json") id := "qez01-2309a" mapRecord, err := GetRawRecord(cfg, id) if err != nil { // ..
GetRecord takes a configuration object and record id, contacts an RDM instance and returns a simplified record and an error value.
GetRecordIds takes a configuration object, contacts am RDM instance and returns a list of ids and error.
GetRecordStaleIds takes a configuration object, contacts am RDM instance and returns a list of ids and error.
GetRecordVersions takes a configuration object and record id, queries the Postgres database and returns the matching json blogs in the rdm_records_medata_version table as a JSON array.
GetReview takes a configuration object, record id and returns an review object (which includes a request id) and error code.
No description provided by the author
GetUserBy takes a field name (e.g.
GetUserID takes a username and returns a list of userid.
GetUsernames returns a list of all usernames in a repository.
GetVersionLatest takes a configuration object and record id, contacts an RDM instance and returns the versons metadata and an error value.
GetVersions takes a configuration object and record id, contacts an RDM instance and returns the versons metadata and an error value.
No description provided by the author
No description provided by the author
No description provided by the author
IsPublic takes an EPrintID and returns true if public, false otherwise Check if an EPrint record "is public".
JSONMarshal provides provide a custom json encoder to solve a an issue with HTML entities getting converted to UTF-8 code points by json.Marshal(), json.MarshalIndent().
JSONMarshalIndent provides provide a custom json encoder to solve a an issue with HTML entities getting converted to UTF-8 code points by json.Marshal(), json.MarshalIndent().
JSONUnmarshal is a custom JSON decoder so we can treat numbers easier.
LinkToDoi removes a leading URL reference (DOI link) if found returning the remainder of the DOI string (prefix slash item identifier).
```.
MigrateEPrintDatasetToCitationsDataset takes a dataset of EPrint objects and migrates the ones in the id list to a citation dataset collection.
MigrateRdmDatasetToCitationsDataset takes a dataset of RDM objects and migrates the ones in the id list to a citation dataset collection.
NewConfig generates an empty configuration struct.
NewDraft takes a configuration object and record id, contacts an RDM instance and create a draft of an existing record and an error value.
NewRecord takes a configuration object and JSON record values.
NewRecordVersion takes a configuration object and record id to create the new version draft.
PatchEndpoint takes an access token and endpoint path along with JSON source as payload and returns JSON source and error value.
PostEndpoint takes an access token and endpoint path along with JSON source as payload and returns JSON source and error value.
ProgressETA returns a string with the percentage processed and estimated time remaining.
ProgressIPS returns a string with the elapsed time and increments per second.
PublishRecordVersion takes a configuration object and record id of a new version draft and publishes it.
PutEndpoint takes an access token and endpoint path along with JSON source as payload and returns JSON source and error value.
No description provided by the author
No description provided by the author
Convert an RDM record to a citation in a Citation struct.
RequestLogger logs http request to service.
RetrieveFile takes a configuration object, record id and filename, contacts an RDM instance and returns the specific file and an error value.
ReviewRequest takes a configuration object and record id, a decision, and optional comment contacts an RDM instance and updates the review status for the submitted draft record.
RunEPrintDSToCitationDS migrates contents from an EPrint dataset collection to a citation dataset collection for a give list of ids and repostiory hostname.
RunRdmDSToCitationDS migrates contents from an RDM dataset collection to a citation dataset collection for a give list of ids and repostiory hostname.
SampleConfig display a minimal configuration for the rdmutil cli.
SendToCommunity sends a draft to an RDM community.
SetAccess takes an access token, record id, a access type and value.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Wraps the simplified package with crosswalks.
No description provided by the author
SetFilesEnable will set the metadata.files.enable value.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
SetPubDate will set the metadata.publication_date value.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
SetVersion will set the metadata.version value.
No description provided by the author
SQLCreateEPrint will read a EPrint structure and generate SQL INSERT, REPLACE and DELETE statements suitable for creating a new EPrint record in the repository.
No description provided by the author
SQLReadEPrint expects a repository map and EPrint ID and will generate a series of SELECT statements populating a new EPrint struct or return an error (e.g.
No description provided by the author
No description provided by the author
UpdateDraft takes a configuration object and record id, contacts an RDM instance and create a draft of a record and an error value.
UploadFiles takes a configuration object and record id, and a map to filename and paths contacts an RDM instance and adds the files to a draft record.

# Constants

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
ReleaseDate, the date version.go was generated.
ReleaseHash, the Git hash when version.go was generated.
Version number of release.

# Variables

No description provided by the author

# Structs

Citation implements the data structure for CiteProc's Item representing a single bibliographic citation.
CitationAgent this describes a person or organization for the purposes of CiteProc item data.
CitationDate holds date information, this includes support for partial dates (e.g.
CitationIdentifier is a minimal object to identify a type of identifier, e.g.
Config holds the common configuration used by all irdmtools.
Doi2Rdm holds the configuration for doi2rdm cli.
No description provided by the author
Ep3Util holds the configuration for ep3util cli.
EPrint2Rdm holds the configuration for rdmutil cli.
EPrintKeysPage holds the structure of the HTML page with the EPrint IDs embedded from the EPrint REST API.
EPrintRest the "app" structure for the service.
No description provided by the author
No description provided by the author
OAIHeader holds the response items for.
No description provided by the author
OAIListIdendifiersResponse.
QueryResponse holds the response to /api/records?q=...
RateLimit holds the values used to play nice with OAI-PMH or REST API.
Rdm2EPrint holds the configuration for rdmutil cli.
RdmUtil holds the configuration for rdmutil cli.
No description provided by the author