gcube-cms-suite/use-cases/notebooks/gCube CMS.ipynb

87 KiB
Raw Blame History

gCube CMS

gCube CMS is a software application that allows users to collaborate in the creation, editing, and production of complex digital components called Projects.

The application is built on top of gCube Infrastructure features, for maximum integration with Virtual Research Environments (VREs) and supports :

  • Complex Data Management : Projects are made of core Metadata, custom Metadata, multiple linked FileSets
  • Versioning, Workflows, Access Policies integrated with the Infrastructure User Role Management
  • Materializations (e.g. SDI Layer, Image Thumbnail, DBMS)
  • Indexing (e.g. GIS Catalogues, Centroids Layer, CKAN catalogue)
  • Event Triggering (e.g. Notifications)

The application is comprised of various components, including :

  • GUIs to facilitate end users in managing and consuming Projects
  • Service implemented the core business logic
  • Plugins set of modular logic implementations (both general-purpose and custom-made) aimed at dealing with Projects Lifecycle Phases and Events

Concept

Parameters Init

Customize this section for different notebook behaviour and initialization.

In [1]:
%run commons.ipynb
Requirement already satisfied: simplejson in /opt/conda/lib/python3.9/site-packages (3.17.6)
Requirement already satisfied: rich[jupyter] in /opt/conda/lib/python3.9/site-packages (12.2.0)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /opt/conda/lib/python3.9/site-packages (from rich[jupyter]) (0.9.1)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /opt/conda/lib/python3.9/site-packages (from rich[jupyter]) (2.11.2)
Requirement already satisfied: ipywidgets<8.0.0,>=7.5.1 in /opt/conda/lib/python3.9/site-packages (from rich[jupyter]) (7.7.0)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /opt/conda/lib/python3.9/site-packages (from ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (1.1.0)
Requirement already satisfied: ipython-genutils~=0.2.0 in /opt/conda/lib/python3.9/site-packages (from ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.2.0)
Requirement already satisfied: nbformat>=4.2.0 in /opt/conda/lib/python3.9/site-packages (from ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (5.2.0)
Requirement already satisfied: traitlets>=4.3.1 in /opt/conda/lib/python3.9/site-packages (from ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (5.1.1)
Requirement already satisfied: ipython>=4.0.0 in /opt/conda/lib/python3.9/site-packages (from ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (8.2.0)
Requirement already satisfied: widgetsnbextension~=3.6.0 in /opt/conda/lib/python3.9/site-packages (from ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (3.6.0)
Requirement already satisfied: ipykernel>=4.5.1 in /opt/conda/lib/python3.9/site-packages (from ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (6.11.0)
Requirement already satisfied: jupyter-client>=6.1.12 in /opt/conda/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (7.2.1)
Requirement already satisfied: tornado>=6.1 in /opt/conda/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (6.1)
Requirement already satisfied: psutil in /opt/conda/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (5.9.0)
Requirement already satisfied: setuptools>=60 in /opt/conda/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (61.3.1)
Requirement already satisfied: matplotlib-inline>=0.1 in /opt/conda/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.1.3)
Requirement already satisfied: debugpy>=1.0 in /opt/conda/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (1.5.1)
Requirement already satisfied: nest-asyncio in /opt/conda/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (1.5.5)
Requirement already satisfied: pexpect>4.3 in /opt/conda/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (4.8.0)
Requirement already satisfied: jedi>=0.16 in /opt/conda/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.18.1)
Requirement already satisfied: backcall in /opt/conda/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.2.0)
Requirement already satisfied: stack-data in /opt/conda/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.2.0)
Requirement already satisfied: pickleshare in /opt/conda/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.7.5)
Requirement already satisfied: decorator in /opt/conda/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (5.1.1)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/conda/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (3.0.27)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/conda/lib/python3.9/site-packages (from nbformat>=4.2.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (4.4.0)
Requirement already satisfied: jupyter-core in /opt/conda/lib/python3.9/site-packages (from nbformat>=4.2.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (4.9.2)
Requirement already satisfied: notebook>=4.4.1 in /opt/conda/lib/python3.9/site-packages (from widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (6.4.10)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /opt/conda/lib/python3.9/site-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.8.3)
Requirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.9/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (21.4.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /opt/conda/lib/python3.9/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.18.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.9/site-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (2.8.2)
Requirement already satisfied: entrypoints in /opt/conda/lib/python3.9/site-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.4)
Requirement already satisfied: pyzmq>=22.3 in /opt/conda/lib/python3.9/site-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (22.3.0)
Requirement already satisfied: terminado>=0.8.3 in /opt/conda/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.13.3)
Requirement already satisfied: Send2Trash>=1.8.0 in /opt/conda/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (1.8.0)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (3.1.1)
Requirement already satisfied: argon2-cffi in /opt/conda/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (21.3.0)
Requirement already satisfied: nbconvert>=5 in /opt/conda/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (6.4.5)
Requirement already satisfied: prometheus-client in /opt/conda/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.13.1)
Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.9/site-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.7.0)
Requirement already satisfied: wcwidth in /opt/conda/lib/python3.9/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.2.5)
Requirement already satisfied: asttokens in /opt/conda/lib/python3.9/site-packages (from stack-data->ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (2.0.5)
Requirement already satisfied: executing in /opt/conda/lib/python3.9/site-packages (from stack-data->ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.8.3)
Requirement already satisfied: pure-eval in /opt/conda/lib/python3.9/site-packages (from stack-data->ipython>=4.0.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.2.2)
Requirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.9/site-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (4.10.0)
Requirement already satisfied: bleach in /opt/conda/lib/python3.9/site-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (4.1.0)
Requirement already satisfied: mistune<2,>=0.8.1 in /opt/conda/lib/python3.9/site-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.8.4)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /opt/conda/lib/python3.9/site-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.5.13)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.9/site-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (2.1.1)
Requirement already satisfied: testpath in /opt/conda/lib/python3.9/site-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.6.0)
Requirement already satisfied: defusedxml in /opt/conda/lib/python3.9/site-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.7.1)
Requirement already satisfied: jupyterlab-pygments in /opt/conda/lib/python3.9/site-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.1.2)
Requirement already satisfied: pandocfilters>=1.4.1 in /opt/conda/lib/python3.9/site-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (1.5.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.9/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (1.16.0)
Requirement already satisfied: argon2-cffi-bindings in /opt/conda/lib/python3.9/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (21.2.0)
Requirement already satisfied: cffi>=1.0.1 in /opt/conda/lib/python3.9/site-packages (from argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (1.15.0)
Requirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.9/site-packages (from beautifulsoup4->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (2.3.1)
Requirement already satisfied: webencodings in /opt/conda/lib/python3.9/site-packages (from bleach->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (0.5.1)
Requirement already satisfied: packaging in /opt/conda/lib/python3.9/site-packages (from bleach->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (21.3)
Requirement already satisfied: pycparser in /opt/conda/lib/python3.9/site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (2.21)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.9/site-packages (from packaging->bleach->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets<8.0.0,>=7.5.1->rich[jupyter]) (3.0.7)

Projects

Projects are the main entities in gCube CMS. They represent a complex document comprising of its metadata, its datasets, and related information aimed at properly consume them.\ The present section guides through the management of CMS Projects, explaining it's model and all supported operations.

Projects enacapsulate user's JSON Document in a richer format comprising :

  • Core Metadata : Metadata information used by the service to manage the project lifecycle
  • The Document : A JSON Object compliant to user's defined schema, reporting both
    • The actual metadata regarding the managed documents
    • Attachments references to payloads and generated materializations

Projects are linked to Use Case Descriptors (UCD), which define how its Document contents are gonna be treated in the system.

More detailed informations can be found at https://gcube.wiki.gcube-system.org/gcube/GeoPortal_Service

NB: In this notebook we are going to use the UCD "basicUCD". If you want to learn more about UCDs, please refer to wiki pages or related notebooks.

NB: User can experiment with different UCDs but they need to be published in the current infrastructure's Context.

Create New

Start This tutorial by creating your new Project.

In this section we are going to create a new simple document and inspect the result. You will learn the basic sections of a Project Metadata and their meaning.

It is sufficient to perform a post operation with our document as the body.

The REST base endpoint for managing projects is SERVICE_BASE_ENDPOINT/projects/UCID where:\

  • SERVICE_BASE_ENDPOINT should be discovered from the infrastructure's Information System
  • UCID is the ID of the Use Case Descriptor to be used with this project
In [2]:
# Edit this section to customize your document
doc={"myTitle":"Some Big Title"}
In [3]:
project= send(method="POST",dest=projects_endpoint,data=json.dumps(doc))

print("Resulting project is ")
print_json(data=project)
GOING TO SEND POST REQUEST TO https://geoportal.dev.int.d4science.net/geoportal-service/srv/projects/basic
Data is {"myTitle": "Some Big Title"}
RESPONSE STATUS 200
Resulting project is 
{
  "_id": "625042b202ad3d6126dfe5b5",
  "_version": "1.0.0",
  "_info": {
    "_creationInfo": {
      "_user": {
        "_username": "FAKE"
      },
      "_context": {
        "_id": "/gcube/devsec/devVRE",
        "_name": "/devVRE"
      },
      "_instant": "2022-04-08T16:12:01.449"
    },
    "_lastEditInfo": null,
    "_access": {
      "_policy": "OPEN",
      "_license": ""
    }
  },
  "_profileID": "basic",
  "_profileVersion": "1.0.0",
  "_lifecycleInformation": {
    "_phase": "DRAFT",
    "_lastInvokedStep": null,
    "_lastOperationStatus": "OK",
    "_errorMessages": [],
    "_warningMessages": [],
    "_triggeredEvents": [
      {
        "event": "INIT_DOCUMENT",
        "lastOperationStatus": "OK",
        "errorMessages": null,
        "warningMessages": null
      }
    ],
    "_notes": null
  },
  "_relationships": null,
  "_spatialReference": null,
  "_temporalReference": null,
  "_theDocument": {
    "myTitle": "Some Big Title"
  },
  "_lock": null
}

Core Metadata Section

Registered JSON Documents are wrapped in Projects, which Let's inspect the registered JSON object structure and the most useful information in it.

The field _theDocument represents our document of interest.

NB : Depending on the linked UCD, the returned document might be different from the output.\ This is due to the fact that the UCD is configured to automatically perform some operation at creation time (e.g. "set defaults values")

In [4]:
print_json(data=project["_theDocument"])
{
  "myTitle": "Some Big Title"
}

ID

The field _id is the unique identifier of the project.\ We are gonna use it for all REST operations that involves our project.

In [5]:
print_json(data=project["_id"])
"625042b202ad3d6126dfe5b5"

Accounting Information

The field _info contains the following accounting information related to the project :

  • _creationInfo : user, context and instant of creation
  • _lastEditInfo : user, context and instant of last edit operation
  • _access : project access policy (can be OPEN, RESTRICTED or EMBARGOED), project license
In [6]:
print_json(data=project["_info"])
{
  "_creationInfo": {
    "_user": {
      "_username": "FAKE"
    },
    "_context": {
      "_id": "/gcube/devsec/devVRE",
      "_name": "/devVRE"
    },
    "_instant": "2022-04-08T16:12:01.449"
  },
  "_lastEditInfo": null,
  "_access": {
    "_policy": "OPEN",
    "_license": ""
  }
}

Lifecycle Information

The field _lifecycleInformation contains the a structured report of the status of the project containing:

  • _phase : The workflow phase in which the document is at the moment
  • _lastInvokedStep : Report on the last executed workflow STEP
  • _triggeredEvents : Collections of reports regarding the triggered events
  • _notes : User defined notes on the status of the project
  • _errorMessages : Human readable error messages, if any
  • _warningMessages : Human readable warning messages, if any

See Plugins section for more information.

In [7]:
print_json(data=project["_lifecycleInformation"])
{
  "_phase": "DRAFT",
  "_lastInvokedStep": null,
  "_lastOperationStatus": "OK",
  "_errorMessages": [],
  "_warningMessages": [],
  "_triggeredEvents": [
    {
      "event": "INIT_DOCUMENT",
      "lastOperationStatus": "OK",
      "errorMessages": null,
      "warningMessages": null
    }
  ],
  "_notes": null
}

Spatial Reference

The field _spatialReference contains a GEOJSON geometry identifying the Project.

E.g. like the following :

"_spatialReference":
  {
    "type": "Point",
    "bbox":
    [
      8.621178639172953,
      40.62975046683799,
      40.630257904721645,
      8.62091913167495
    ],
    "coordinates":
    [
      -32.00907926554869,
      -32.00883133516304
    ]
  }

Temporal Reference

TBD

Editing a project

Edit the Document section of a Project.

The Document section is the actual document that needs to be managed, and the most basic operation is updating it.\ To do this it's sufficient to perform a put operation with our document as the body, specifying our project's ID

The endpoint for this operation is going to be SERVICE_BASE_ENDPOINT/projects/UCID/ID where:

  • SERVICE_BASE_ENDPOINT should be discovered from the infrastructure's Information System
  • UCID is the ID of the Use Case Descriptor to be used with this project
  • ID is our project Unique Identifier
In [8]:
#We update our document nesting a sub section in it
doc={
    "myTitle":"My newer edited title",
    "section":{"sectionTitle":"My new SubSection"}
}
In [9]:
# NB uncomment to actually perform edit
project = send(method="PUT",data=json.dumps(doc),dest=projects_endpoint+"/"+str(project["_id"]))
print_json(data=project["_theDocument"])
GOING TO SEND PUT REQUEST TO https://geoportal.dev.int.d4science.net/geoportal-service/srv/projects/basic/625042b202ad3d6126dfe5b5
Data is {"myTitle": "My newer edited title", "section": {"sectionTitle": "My new SubSection"}}
RESPONSE STATUS 200
{
  "myTitle": "My newer edited title",
  "section": {
    "sectionTitle": "My new SubSection"
  }
}

Upload Attachments : FileSets

Projects' documents can contain multiple collections of files called FileSet.

We use the concept of FileSet to identify a collection of files that should be managed together (e.g. GIS shape file along with its SHX file).\ In this section we are going to register a FileSet in our project.\

The operations triggers :

  • the upload and archive of passed data streams
  • registration of the file in "_theDocument", precisely at the JSON path declared in the request

NB Please beware that FileSet JSON paths MUST be deined in our UCD schema section.

In order to perform this operation we are going to create a JSON object request with at least the following information :

  • fieldDefinitionPath : a JSON path pointing to the field definition in UCD schema [NB root is "schema"]
  • parentPath : a JSON path pointing to the parent of our target element [NB root is "_theDocument"]
  • fieldName : the target element name at which the FileSet is going to be registered
  • streams : a collection of JSON objects, each containing
    • id : the infrastructure STORAGE volatile ID of our File
    • url : [Alternative to ID] the url at which our File can be downloaded.
    • filename : our File name
In [10]:
registerFilesetRequest={
    "fieldDefinitionPath":"$.section._children[?(@.fileset)]",
    "parentPath":"$.section",
    "fieldName":"fileset",
    "streams":[{"url":"https://www.cnr.it/sites/all/themes/custom/cnr03_theme/img/cnrlogo.svg","filename":"myLogo.svg"}]
}

Once our request is ready we can send it to server with a POST request at SERVICE_BASE_ENDPOINT/projects/UCID/registerFiles/ID where:\

  • SERVICE_BASE_ENDPOINT should be discovered from the infrastructure's Information System
  • UCID is the ID of the Use Case Descriptor to be used with this project
  • ID is our project Unique Identifier
In [11]:
project = send(method="POST",dest=projects_endpoint+"/registerFiles/"+str(project["_id"]),data=json.dumps(registerFilesetRequest))

print("Our Registered FileSet")
print_json(data=project["_theDocument"]["section"]["fileset"])
GOING TO SEND POST REQUEST TO https://geoportal.dev.int.d4science.net/geoportal-service/srv/projects/basic/registerFiles/625042b202ad3d6126dfe5b5
Data is {"fieldDefinitionPath": "$.section._children[?(@.fileset)]", "parentPath": "$.section", "fieldName": "fileset", "streams": [{"url": "https://www.cnr.it/sites/all/themes/custom/cnr03_theme/img/cnrlogo.svg", "filename": "myLogo.svg"}]}
RESPONSE STATUS 500
CAUSE 
<!doctype html><html lang="en"><head><title>HTTP Status 500  Internal Server Error</title><style type="text/css">h1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} h2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} h3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} body {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} b {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} p {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;} a {color:black;} a.name {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 500  Internal Server Error</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> Internal Server Error</p><p><b>Description</b> The server encountered an unexpected condition that prevented it from fulfilling the request.</p><hr class="line" /><h3>Apache Tomcat/8.5.39 (Ubuntu)</h3></body></html>
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 project = send(method="POST",dest=projects_endpoint+"/registerFiles/"+str(project["_id"]),data=json.dumps(registerFilesetRequest))
      3 print("Our Registered FileSet")
      4 print_json(data=project["_theDocument"]["section"]["fileset"])

Input In [1], in send(dest, data, method, params)
     38        print("CAUSE "+str(response.reason))
     39        print(response.text)
---> 40 return json.loads(response.text)

File /opt/conda/lib/python3.9/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    341     s = s.decode(detect_encoding(s), 'surrogatepass')
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:
    348     cls = JSONDecoder

File /opt/conda/lib/python3.9/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    332 def decode(self, s, _w=WHITESPACE.match):
    333     """Return the Python representation of ``s`` (a ``str`` instance
    334     containing a JSON document).
    335 
    336     """
--> 337     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338     end = _w(s, end).end()
    339     if end != len(s):

File /opt/conda/lib/python3.9/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Workflow operation : Execute STEP

Execute operations on a Project in order to manage its lifecycle

Projects lifecycle may involve a lot of operations (usually perfomed by different users), that should be perfomed in a pre defined order eg :

  • registration
  • approval
  • rejection
  • validation
  • notifications
  • processing :
    • materialization of Fileset in dedicated engines in order to properyl consume them (e.g. GIS, DBs, Catalogues, Data Analytics..)
    • data analytics

We can then visualize our document as passing through different PHASES of it lifecycle (e.g. DRAFT,APPROVED ...). We call STEPs the operations that may later the PHASE of a project by :

  • performing a predefined operation on our document
  • [optionally]triggering generic events (e.g. notifications)
  • altering a Project lifecycle information (reporting outcome, status, messages etc.)

STEPS are implemented in LifecycleManager plugins. The suite comes with a default set of a pre defined, general-purpose plugins that can help manage the user's project. This set can be extended for custom implementations.

NB : Projects lifecycles are configured in linked Use Case Descriptors.\ Each Project starts in Draft PHASE and then changes according to configured Lifecycle Manager.

Once our request is ready we can send it to server with a POST request at SERVICE_BASE_ENDPOINT/projects/UCID/step/ID where:\

  • SERVICE_BASE_ENDPOINT should be discovered from the infrastructure's Information System
  • UCID is the ID of the Use Case Descriptor to be used with this project
  • ID is our project Unique Identifier

Depending on implementations and configurations, invoked STEP may expect/require some additional parameters.\ These can be specified in the request body as a simple JSON object. Please refere to specific plugin documentation for details.

In [ ]:
#NB Assuming UCID = "basic"
#Prepare STEP execution request
STEPrequest={"stepID" : "PUBLISH"}
project= send(method="POST",dest=projects_endpoint+"/step/"+str(project["_id"]),data=json.dumps(STEPrequest))

print("Our resulting project new lifecycle information")
print_json(data=project["_lifecycleInformation"])

Materialization

Learn how Filesets are materialized into enhanced resources (e.g. GIS Layers, DBMS Tables, Algorithms..)

As discussed above, Fileset may need to be managed in order to :

  • materialize them in dedicated engines in order to properyl consume them (e.g. GIS, DBs, Catalogues, Data Analytics..)
  • perform data analytics tasks

If these processes generates some kind of consumable resources, they are registered in the Fileset as Materializations

If the above execution went well (check for errors in the displayed lifecycle information), our fileset should have been materialized as a layer in our infrastructure SDI.\ See below the enhanced fileset with new generated information about our new layer.

NB Each FileSet Materialization type has a different structure, in order to adapt to the needs of the application.

In [ ]:
print_json(data=project["_theDocument"]["section"]["fileset"])

Gcube SDI Layers

GCUBE SDI Layers are materializations of a filesets in a gCube SDI. They expose the following informations, allowing applications and users to consume the generated resources:

  • _ogcLinks : Open Geospatial Consortium standard URLs
  • _bbox
  • _platformInfo : collection of platform - specific JSON objects

Known platform info types are :

  • GeoServer
  • GeoNetwork (TBD)
  • Thredds (TBD)

Here is the layer generated in the above STEP execution

In [ ]:
#pipenv install arcgis
# try and visualize GIS 
In [ ]:
print_json(data=project["_theDocument"]["section"]["fileset"]["materializations"][0])

Accessing Projects

The service allows for both access and querying against the internal Document archive.

Querying

Queries can be perfomed with a POST HTTP Request at SERVICE_BASE_ENDPOINT/projects/UCID/query with a JSON body with the following fields :

  • filter : the JSON filter to apply (see MongoDB documentation for more details)
  • projection : the target format to use to represent the returned JSON objects (see MongoDB documentation for more details)
  • ordering : determines ordering behaviour -- direction : allowed values are ASCENDING and DESCENDING -- fields : list of fields used for ordering
  • paging : determines the paged request's window -- offset -- limit

This feature heavily relies on the underlying Document Store Engine, currently mongoDB.\ Please refer to mongoDB documentation for more details.

The following sections provide some Query example.

In [ ]:
#Query All
queryAll = {}

# Query All with PHASE = PUBLISHED
queryPublished = {
    "filter" : {"_lifecycleInformation._phase": {"$eq" : "PUBLISHED"}}
}
#ADD Ordering
queryOrdering = {
    "filter" : {"_lifecycleInformation._phase": {"$eq" : "PUBLISHED"}},
    "ordering" :{"direction":"ASCENDING","fields":[{"_theDocument.title"}]}
}
#GET ONLY FIRST ELEMENT
queryFirst = {
    "filter" : {"_lifecycleInformation._phase": {"$eq" : "PUBLISHED"}},
    "ordering" :{"direction":"ASCENDING","fields":[{"_theDocument.title"}]},
    "paging" : {"offset":0,"limit":1}
}
#GET ONLY TITLE AND AUTHOR
queryTitleAndAuthor = {
    "filter" : {"_lifecycleInformation._phase": {"$eq" : "PUBLISHED"}},
    "ordering" :{"direction":"ASCENDING","fields":[{"_theDocument.title"}]},
    "paging" : {"offset":0,"limit":1},
    "projection" :{"_theDocument.title":1,"_info.creationUser.username":1}
}
In [ ]:
#Actually performing query
found = send(dest=projects_endpoint+"/query",data=json.dumps(queryAll),method="POST")
print("Result Count :"+str(len(found)))

Get By ID

As standard REST agreement, Projects can be obtained by their ID with a POST HTTP Request at SERVICE_BASE_ENDPOINT/projects/UCID/ID

In [ ]:
#Lets take an id from previous query
projectId=found[0]["_id"]
project = send(method="GET",dest=projects_endpoint+projectId)
print_json(data=project)

Runtime Configuration

Learn about how the service exposes information on how to access and consume generated resources.

As a collateral effect of workflow management, the following resources may be generated at runtime and managed by the service :

  • Uploaded file archive : Based on gCube Workspace
  • Document archive : Document store based archive of Projects
  • Indexes : GIS, Textual, Documental indexes, DBMS

These resources are not linked to one particular Project, but are related to the management of a specific UCD.\ Applications that need to access these resources (e.g. GUIs) can access a Configuration report at SERVICE_BASE_ENDPOINT/projects/UCID/configuration.

In [ ]:
print_json(data=send(dest=projects_endpoint+"/configuration",method="GET"))

Use Case Descriptors - UCD

Use Case Descriptors (UCDs) are JSON documents containing the configuration settings needed for the management of our Projects, from the creation to their consumption.

In this section we are going to inspect both generic configurations (e.g. Data Access Policies) and some common Plugin configuration (e.g. Lifecycle Management)

In [ ]:
# loads UCD from service
queryUCDByID = {"filter" : {"_id":{"$eq":ucid}}}

UCD = send(method="POST",data=json.dumps(queryUCDByID),dest=ucd_endpoint+"query")

print_json(data=UCD)

UCD Format

UCD content can be expanded in order to satisfy involved components needs, but its basic structure is fixed.

UCD most notable fields are :

  • _id : the Unique Identifier of the UCD
  • _schema : the schema of the document to be handled in Projects (see "_theDocument") linked to this UCD
  • _creationInfo : accounting information about author, time of creation, Infrastructure Context
  • _dataAccessPolicies : access rules based on user Role
  • _handlers : specific configurations for all involved components (both Plugins and GUIs)
In [ ]:
print_json(data=UCD)

Schema

This section represents the expected structure of a managed Document inside a Project linked to the present UCD.

In general it's an optional section of a UCD, but most Handlers and features (e.g. FileSet operations) rely on this section.

It is represented as a map of extensible Field objects with at least the following information :

  • _type : String
  • _max : Integer Maximum cardinality : (Default is 1)
  • _min : Integer Minimum cardinality : (Default is 0)
  • _label : String Human Readable Label
  • _children : Field Collection Nested fields definition
In [ ]:
print_json(data=UCD["_schema"])

Data Access Policies

Data Access Policies determine both READ and WRITE authorization on Projects, based on the user's Role

Section _dataAccessPolicies of UCD contains a list of Rules in the following format :

  • _policy [Mandatory] : contains read and write operation policy for the present Rule (allowed values are any,own and none)
  • _roles [Mandatory] : Collections of users roles for witch the present Rule should be applied
  • _enforcer : Additional conditions that should be always applied when accessing data (allowed value is filter)

E.g. The following rule :

  • Applies as default (no roles are specified)
  • Denies any WRITE operation
  • Allows read operations to any Project
  • Filters any access to Projects selecting only the one which Phase is Published
{"_policy" : {"_read" : "any", "_write" : "none"}, "_roles":[],
      "_enforcer": {"_filter" :  {"_lifecycleInformation._phase" :  {"$eq" :  "Published"}}}},
In [ ]:
print_json(data=UCD["_dataAccessPolicies"])

Handlers Configuration

Handler configurations are simple JSON objects that wrap specific configuration for sw components that are going to consume our Projects.

Its default structure is :

  • _id : identifies the target plugin
  • _type : declares the target plugin type
  • _configuration : defines the actual plugin configuration (plugin specific)

Please refere to the specific plugin guide for more information

Access and Querying

UCDs are also cached in the underlying document store Database, allwoing for the same querying capabilities as for Projects.

UCDs can be accessed by their cached MongoID at the following endpoint : SERVICE_BASE_ENDPOINT/ucd/MongoID

In [ ]:
#Perform get UCD by ID 
print_json(data=send(method="GET",dest=ucd_endpoint+str(UCD["_mongoId"])))

However, applications might want to perform one of the following queries or a more specific one.\ Check Accessing Projects section for more details.

In [ ]:
#Query BY UCID
query = {"filter" : {"_id":{"$eq":ucid}}}
In [ ]:
#Actually perform query
print(send(method="POST",data=query,dest=ucd_endpoint+"query"))

Plugins

The CMS service itself only deals with generic logic of Projects lifecycle management. The actual hard work is performed by the Plugins configured in the linked Use Case Descriptor.

In more details, the the basic service logic relies on a Lifecycle Manager for the execution of STEPS and EVENT handling, which in turn may exploit other plugins for its purposes.\ Projects and document are only actually updated by the service logic, granting full centralized error management.

Lifecycle

In this section we are going to introduce some of the most common plugins that can be configured in UCDs.\ NB : Plugin set can be expanded by the community both from scratch and by extending the already implemented behaviour

Lifecycle Managers

Lifecycle Managers are the main responsible for the execution of a Project workflow.

They define the support for STEPs, EVENTs and the resulting document PHASE, so basically they define the Workflow itself.

However customizable, workflows shares some basic behaviour :

  • Each Project starts in DRAFT PHASE, then it evolves depending on the configured Lifecycle Manager.
  • The following EVENTS are always triggered (even if they can be implemented as NO-OP) :
    • ON INIT
    • ON UPDATE
    • ON DELETE
    • ON DELETE FILESET

NB : ROLE based access to STEPs is entrusted to Lifecycle Managers. Provided implementations exploit the feature BasicRoleManager documented below.

The service is distributed with some default implementations in order to provide support for common situations.

DEFAULT-SINGLE-STEP

DEFAULT-SINGLE-STEP Lifecycle Manager is a simple workflow implementation consisting of a single STEP "Publish".

PUBLISH step:

  • materializes GIS Resources exploiting SDI-Default-Materializer plugin
  • indexes GIS resources in a centroid layers exploiting </strong>SDI-Indexer-Plugin</strong> plugin
  • sets the Project PHASE to "PUBLISHED"
  • [optionally] sends notifications via </strong>Notification</strong> plugin

The following events are also managed :

  • ON INIT (basic validation, basic default values evaluation)
  • ON UPDATE (basic validation, basic default values evaluation)
  • ON DELETE (delete all materializations and index references)
  • ON DELETE FILESET (delete Fileset materializations and update index references)

It exploits the feature BasicRoleManager documented below.

DEFAULT-3PHASE

DEFAULT-3PHASE Lifecycle Manager is a moderated workflow implementation which allows for approval a rejection of submitted Drafts.

It can be summarized with the following diagram :

3Phase

It is built on top of DEFAULT-SINGLE-STEP, extending its behavior in order to :

  • Manage different indexes (restricted and public) for documets in PHASES "Awaiting Approval" and "Published"
  • Allow for iterative editing cycle between submitting user and moderators

It exploits the feature BasicRoleManager documented below.

Basic Role Manager

Basic role manager is an internal feature of the CMS service. It restricts execution of STEPs based on user Role. It reads the Lifecycle manager configuration expecting a collection of rules like in the following example.

The following configuration allows users with Editor role to only SUBMIT projects, while Moderator and DataAdmin can also PUBLISH

{
      "_id" :  "DEFAULT-SINGLE-STEP",
      "_type" :  "LifecycleManagement",
      "_configuration" : {
          ...
        "step_access" : [
          {"STEP" : "PUBLISH", "roles" :[ "Moderator","DataAdmin"]},
          {"STEP" : "SUBMIT", "roles" :[ "Editor","Moderator","DataAdmin"]}
        ]
      }
    }

Indexer Plugins

Indexer plugins are in charge of managing resources (Indexes)that contain references to the Projects in order to provide some kind of browsing experience through them.\ They use the information on the selected Project to manage Index entries and (optionally) generate spatialReferences and temporalReferences.

SDI-Indexer-Plugin

This plugin manages centroids layers in the SDI GeoServer, generating the Project spatialReference.

It expects following UCD Handler Configuration :

  • bboxEvaluation Collection of JSONPaths to use in order to evaluate the Project centroid
  • explicitFieldMapping Collection of JSON Object defining the centroid record fields along with JSONPath for their evaluation
  • additionalLayers Collection of JSON Objects defining additional layers to use for cross-reference querying of the centroid layers

A typycal configuration might be as following :

{
      "_id" : "SDI-Indexer-Plugin",
      "_type" : "Indexer",
      "_configuration" : {
        "bboxEvaluation" : ["$.._bbox"],
        "explicitFieldMapping" : [
          {"name" :  "titolo", "path" :  "$._theDocument.title", "type" : "TEXT"}
        ],
        "jslt" : {},
        "additionalLayers" : [
          {"source" : {"url" :  "..."},"toSetTitle":""}
        ]
      }
    }

It also expects the Lifecycle manager to provide the following call Parameters :

  • Indexing request
    • indexName : must be unique (table in postgis)
    • worksapce : GeoServer Workspace
    • centroidRecord : optionally declare the record to index instead of evaluating it

Materializer Plugins

Materializer plugins are in charge of generating resources representing FileSets, and producing their references to be set in the Document itself.

SDI-Indexer-Plugin

This plugin orchestrate the creation and publication of GIS Layers in the infrastructure's SDI, registering FileSets in its specific target engine (depending on data formats).

Currently supported data formats are :

  • SHP : shape files (along with SHX files if present)

It expects the following UCD Handler Configuration :

  • registeredFileSetPaths Collection of JSONPaths pairs pointing to the FileSets to be Materialized by this plugin.

A typycal configuration might be as following :

{
      "_id" :  "SDI-Default-Materializer",
      "_type" :  "Materializer",
      "_configuration" : {
        "registeredFileSetPaths" : [
          {"schemaField" :  "pianteFineScavo","documentPath" :  "pianteFineScavo[*].fileset"},
          {"schemaField" :  "posizionamentoScavo","documentPath" :  "posizionamentoScavo.fileset"}
        ]
      }
    }

Getting installed plugins

The service allows for inspaction and management of installed plugins.

Installed plugins descriptors can be obtained at SERVICE_BASE_ENDPOINT/plugins and SERVICE_BASE_ENDPOINT/plugins/pluginID

In [ ]:
print_json(data=send(method="GET",dest=plugins_endpoint))