Categorygithub.com/YaleSpinup/ds-api
modulepackage
1.2.3
Repository: https://github.com/yalespinup/ds-api.git
Documentation: pkg.go.dev

# README

ds-api

CircleCI

This API provides API access to the Spinup Data Set service.

Endpoints

GET /v1/ds/ping
GET /v1/ds/version
GET /v1/ds/metrics

POST /v1/ds/{account}/datasets/{group}
GET /v1/ds/{account}/datasets/{group}/{id}
PATCH /v1/ds/{account}/datasets/{group}/{id}
PUT /v1/ds/{account}/datasets/{group}/{id}
DELETE /v1/ds/{account}/datasets/{group}/{id}

POST /v1/ds/{account}/datasets/{group}/{id}/attachments
DELETE /v1/ds/{account}/datasets/{group}/{id}/attachments
GET /v1/ds/{account}/datasets/{group}/{id}/attachments

GET /v1/ds/{account}/datasets/{group}/{id}/instances
POST /v1/ds/{account}/datasets/{group}/{id}/instances
DELETE /v1/ds/{account}/datasets/{group}/{id}/instances/{instance_id}

GET /v1/ds/{account}/datasets/{group}/{id}/logs

GET /v1/ds/{account}/datasets/{group}/{id}/users
POST /v1/ds/{account}/datasets/{group}/{id}/users
DELETE /v1/ds/{account}/datasets/{group}/{id}/users
PUT /v1/ds/{account}/datasets/{group}/{id}/users

Usage

Create a dataset

POST /v1/ds/{account}/datasets/{group}

{
    "name": "awesome-dataset-of-stuff",
    "type": "s3",
    "derivative": true,
    "tags": [
        { "key": "Application", "value": "ButWhyyyyy" },
        { "key": "COA", "value": "Take.My.Money" },
        { "key": "CreatedBy", "value": "SomeGuy" }
    ],
    "metadata": {
        "description": "The hugest dataset of awesome stuff",
        "created_at": "2018-03-28T07:36:01.123Z",
        "created_by": "drzoidberg",
        "data_classifications": ["hipaa","pii"],
        "data_format": "file",
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "modified_at": "2019-03-28T07:36:01.123Z",
        "modified_by": "pfry",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": ["e15d2282-9c68-46b5-801c-2b5a62484624", "a7c082ee-f711-48fa-8a57-25c95b3a6ddd"]
    }
}

Response

{
    "id": "d37b375b-d136-4b17-8666-5036dc554a66",
    "repository": "dataset-localdev-d37b375b-d136-4b17-8666-5036dc554a66",
    "metadata": {
        "id": "d37b375b-d136-4b17-8666-5036dc554a66",
        "name": "awesome-dataset-of-stuff",
        "description": "The hugest dataset of awesome stuff",
        "created_at": "2020-03-11T18:41:32Z",
        "created_by": "drzoidberg",
        "data_classifications": [
            "hipaa",
            "pii"
        ],
        "data_format": "file",
        "data_storage": "s3",
        "derivative": true,
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "modified_at": "2020-03-11T18:41:32Z",
        "modified_by": "pfry",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": [
            "e15d2282-9c68-46b5-801c-2b5a62484624",
            "a7c082ee-f711-48fa-8a57-25c95b3a6ddd"
        ]
    }
}
Response CodeDefinition
202 Acceptedcreation request accepted
400 Bad Requestbadly formed request
403 Forbiddenyou don't have access to bucket
404 Not Foundaccount not found
409 Conflictbucket or iam policy already exists
429 Too Many Requestsservice or rate limit exceeded
500 Internal Server Errora server error occurred
503 Service Unavailablean AWS service is unavailable

Get information about a dataset

GET /v1/ds/{account}/datasets/{group}/{id}

{
    "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
    "metadata": {
        "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
        "name": "awesome-dataset-of-stuff",
        "description": "The hugest dataset of awesome stuff",
        "created_at": "2020-03-16T15:38:14Z",
        "created_by": "drzoidberg",
        "data_classifications": [
            "hipaa",
            "pii"
        ],
        "data_format": "file",
        "data_storage": "s3",
        "derivative": true,
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "modified_at": "2020-03-16T15:38:14Z",
        "modified_by": "pfry",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": [
            "d37b375b-d136-4b17-8666-5036dc554a66",
        ]
    },
    "repository": {
        "name": "dataset-localdev-bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
        "empty": false,
        "tags": [
            {
                "key": "CreatedBy",
                "value": "SomeGuy"
            },
            {
                "key": "spinup:org",
                "value": "localdev"
            },
            {
                "key": "ID",
                "value": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8"
            },
            {
                "key": "COA",
                "value": "Take.My.Money"
            },
            {
                "key": "Application",
                "value": "ButWhyyyyy"
            },
            {
                "key": "Name",
                "value": "awesome-dataset-of-stuff"
            }
        ]
    }
}
Response CodeDefinition
200 OKokay
400 Bad Requestbadly formed request
404 Not Founddataset not found
500 Internal Server Errora server error occurred

Promote a dataset

PATCH /v1/ds/{account}/datasets/{group}/{id}

Headers:

X-Forwarded-User: awong

Response

{
    "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
    "metadata": {
        "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
        "name": "awesome-dataset-of-stuff",
        "description": "The hugest dataset of awesome stuff",
        "created_at": "2020-03-16T15:38:14Z",
        "created_by": "drzoidberg",
        "data_classifications": [
            "hipaa",
            "pii"
        ],
        "data_format": "file",
        "data_storage": "s3",
        "derivative": false,
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "finalized_at": "2020-06-01T19:27:35Z",
        "finalized_by": "awong",
        "modified_at": "2020-06-01T19:27:35Z",
        "modified_by": "awong",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": [
            "d37b375b-d136-4b17-8666-5036dc554a66",
        ]
    }
}
Response CodeDefinition
200 OKokay
400 Bad Requestbadly formed request
404 Not Founddataset not found
409 Conflictdataset already finalized
500 Internal Server Errora server error occurred

Update dataset metadata

PUT /v1/ds/{account}/datasets/{group}/{id}

Headers:

X-Forwarded-User: awong

Request:

{
	"metadata": {
		"description": "It's actually a tiny dataset"
	}
}

Response

{
    "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
    "metadata": {
        "id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
        "name": "awesome-dataset-of-stuff",
        "description": "It's actually a tiny dataset",
        "created_at": "2020-03-16T15:38:14Z",
        "created_by": "drzoidberg",
        "data_classifications": [
            "hipaa",
            "pii"
        ],
        "data_format": "file",
        "data_storage": "s3",
        "derivative": false,
        "dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
        "finalized_at": "2020-06-01T19:27:35Z",
        "finalized_by": "awong",
        "modified_at": "2020-06-01T21:31:05Z",
        "modified_by": "awong",
        "proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
        "source_ids": [
            "d37b375b-d136-4b17-8666-5036dc554a66",
        ]
    }
}
Response CodeDefinition
200 OKokay
400 Bad Requestbadly formed request
404 Not Founddataset not found
500 Internal Server Errora server error occurred

Delete a dataset

DELETE /v1/ds/{account}/datasets/{group}/{id}

Headers:

X-Forwarded-User: awong
Response CodeDefinition
204 OKokay
400 Bad Requestbadly formed request
404 Not Founddataset not found
500 Internal Server Errora server error occurred

Create attachment for a dataset

POST /v1/ds/{account}/datasets/{group}/{id}/attachments

The request needs to be a multipart/form-data with the following parameters:

  • name - the name of the attachment as it should be saved, e.g. eula.txt
  • attachment - the content of the file being uploaded

Response

[
    "eula.txt"
]
Response CodeDefinition
200 OKokay
400 Bad Requestbadly formed request, or file too big
404 Not Founddataset not found
500 Internal Server Errora server error occurred

Delete attachment from a dataset

DELETE /v1/ds/{account}/datasets/{group}/{id}/attachments

{
	"attachment_name": "dummy.doc"
}

Response

Response CodeDefinition
204 OKattachment deleted, if it existed
400 Bad Requestbad request
404 Not Foundaccount/dataset not found
500 Internal Server Errora server error occurred

Get attachments for a dataset

GET /v1/ds/{account}/datasets/{group}/{id}/attachments

Response

[
    {
        "Name": "Dataset Data Use Agreement.pdf",
        "Modified": "2020-05-17T02:04:27Z",
        "Size": 3708454,
        "URL": "https://dataset-localdev-3cadbe31-27e9-4f7a-9515-51ec9d754022.s3.amazonaws.com/_attachments/Dataset%20Data%20Use%20Agreement.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAXQVXYEBXA5X5LRN3%2F20200518%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200518T132423Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=342d937b7b726408c2efe41493d126ea577204f85ffe77ffc9b3cf22af80c7ea"
    },
    {
        "Name": "eula.txt",
        "Modified": "2020-05-18T13:19:34Z",
        "Size": 6920,
        "URL": "https://dataset-localdev-3cadbe31-27e9-4f7a-9515-51ec9d754022.s3.amazonaws.com/_attachments/eula.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAXQVXYEBXA5X5LRN3%2F20200518%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200518T132423Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=c2d7f7165ce3c099e8eefcb14e3b4c7e0e6a319af48d6727f25519f35488b14a"
    }
]
Response CodeDefinition
200 OKokay
400 Bad Requestbadly formed request
404 Not Foundaccount/dataset not found
500 Internal Server Errora server error occurred

List all instances that have access to a dataset

GET /v1/ds/{account}/datasets/{group}/{id}/instances

{
    "id": "95db5a7b-466b-4aa7-bbe1-1e23ed860f32",
    "access": {
        "i-01f9bfb7ee683e807": "instanceRole_i-01f9bfb7ee683e807"
    }
}
Response CodeDefinition
200 OKokay
400 Bad Requestbadly formed request
404 Not Foundaccount/dataset not found
500 Internal Server Errora server error occurred

Grant dataset access to an instance

POST /v1/ds/{account}/datasets/{group}/{id}/instances

{
	"instance_id": "i-01f9bfb7ee683e807"
}

Response

{
    "id": "95db5a7b-466b-4aa7-bbe1-1e23ed860f32",
    "access": {
        "i-01f9bfb7ee683e807": "instanceRole_i-01f9bfb7ee683e807"
    }
}
Response CodeDefinition
200 OKinstance access granted
400 Bad Requestbadly formed request
404 Not Foundaccount/dataset not found
500 Internal Server Errora server error occurred

Revoke dataset access from an instance

DELETE /v1/ds/{account}/datasets/{group}/{id}/instances/{instance_id}

Response CodeDefinition
204 OKinstance access revoked
400 Bad Requestbad request, or instance doesn't have access
404 Not Foundaccount/dataset not found
500 Internal Server Errora server error occurred

Get audit logs for a dataset

GET /v1/ds/{account}/datasets/{group}/{id}/logs

Response

[
   "11/19/2020, 17:07:28 - Created dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (CreatedBy: drzoidberg)",
    "11/19/2020, 17:51:39 - Updated metadata for dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (ModifiedBy: awong)",
    "11/19/2020, 17:56:33 - Finalized original dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (ModifiedBy: me)"
]
Response CodeDefinition
200 OKokay
400 Bad Requestbadly formed request
404 Not Foundaccount/dataset not found
500 Internal Server Errora server error occurred

Create a user for a dataset

POST /v1/ds/{account}/datasets/{group}/{id}/users

Request body is empty.

Response

{
    "user": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpUsr",
    "group": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpGrp",
    "policy": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpPlc",
    "credentials": {
        "akid": "XXXXXXXXXXXXXXXXXXXX",
        "secret": "secretsecretsecretsecretsecretsecret",
    }
}
Response CodeDefinition
200 OKinstance access granted
400 Bad Requestbadly formed request
404 Not Foundaccount/dataset not found
409 Conflictuser already exists
500 Internal Server Errora server error occurred

Delete a user for a dataset

DELETE /v1/ds/{account}/datasets/{group}/{id}/users

Response

Response CodeDefinition
200 OKinstance access granted
400 Bad Requestbadly formed request
404 Not Foundaccount/dataset/user not found
500 Internal Server Errora server error occurred

Get a user for a dataset

GET /v1/ds/{account}/datasets/{group}/{id}/users

Response

{
    "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpUsr": {
        "keys": {
            "XXXXXXXXXXXXXXXXXXXX": "Inactive",
            "YYYYYYYYYYYYYYYYYYYY": "Active"
        }
    }
}
Response CodeDefinition
200 OKinstance access granted
400 Bad Requestbadly formed request
404 Not Foundaccount/dataset/user not found
500 Internal Server Errora server error occurred

Update a user's key for a dataset

PUT /v1/ds/{account}/datasets/{group}/{id}/users

Request body is empty.

Response

{
    "keys": {
        "XXXXXXXXXXXXXXXXXXXXX": "Inactive"
    },
    "credentials": {
        "akid": "YYYYYYYYYYYYYYYYYYYYY",
        "secret": "secretsecretsecretsecretsecretsecret"
    }
}
Response CodeDefinition
200 OKinstance access granted
400 Bad Requestbadly formed request
404 Not Foundaccount/dataset not found
429 Limit Exceededmaximum number of keys
500 Internal Server Errora server error occurred

Authentication

Authentication is accomplished using a pre-shared key (hashed string) in the X-Auth-Token header.

API Configuration

API configuration is via config/config.json, an example config file is provided.

You can specify a single metadataRepository where metadata about all the different data sets will be stored. Currently, the only supported type is s3, so you need to provide an S3 bucket and credentials with full access to that bucket. For example, if you created a bucket called spinup-example-metadata-repository, then the IAM policy would be:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::spinup-example-metadata-repository",
                "arn:aws:s3:::spinup-example-metadata-repository/*"
            ]
        }
    ]
}

You can then define a list of accounts for the actual dataset repositories - that's where the data sets will be stored. Currently, the only supported type is s3, so you need to provide credentials in each account with the appropriate S3 and IAM access. This is a good starting IAM policy if you don't modify the default name and path prefixes:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:*",
            "Resource": [
                "arn:aws:iam::*:role/spinup/dataset/*",
                "arn:aws:iam::*:instance-profile/spinup/dataset/*",
                "arn:aws:iam::*:group/spinup/dataset/*",
                "arn:aws:iam::*:user/spinup/dataset/*",
                "arn:aws:iam::*:policy/spinup/dataset/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:GetInstanceProfile",
                "iam:ListAttachedRolePolicies",
                "iam:PassRole"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3::*:dataset-*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AssociateIamInstanceProfile",
                "ec2:DescribeIamInstanceProfileAssociations",
                "ec2:DescribeInstances",
                "ec2:DisassociateIamInstanceProfile"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "arn:aws:logs:*:*:log-group:/spinup/ORG/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:ListTagsLogGroup",
                "logs:CreateLogStream",
                "logs:TagLogGroup",
                "logs:DescribeLogGroups",
                "logs:DeleteLogGroup",
                "logs:DescribeLogStreams",
                "logs:GetLogEvents",
                "logs:PutRetentionPolicy",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:log-group:/spinup/ORG/*:log-stream:*",
                "arn:aws:logs:*:*:log-group:/spinup/ORG/*"
            ]
        }
    ]
}

Dataset groups

When creating a data set you need to specify a group that it belongs to. The group could be any arbitrary string and it just provides a way to group similar datasets together (e.g. data sets that are part of the same application or department). Currently, the group is only used for logging purposes but eventually it will play a more significant role.

Authors

E Camden Fisher [email protected] Tenyo Grozev [email protected]

License

GNU Affero General Public License v3.0 (GNU AGPLv3)
Copyright (c) 2020 Yale University

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Constants

No description provided by the author

# Variables

Buildstamp is the timestamp the binary was built, it should be set at buildtime with ldflags.
Githash is the git sha of the built binary, it should be set at buildtime with ldflags.
Version is the main version number.
VersionPrerelease is a prerelease marker.