Setting up the harvesters (ckan container)

Use supervisor to allow fetch and gather consumers in the background:
https://github.com/ckan/ckanext-harvest#setting-up-the-harvesters-on-a-production-server
This commit is contained in:
mjanez 2024-02-21 09:16:47 +01:00
parent 7e5e4e8099
commit fb158d7ce3
19 changed files with 229 additions and 80 deletions

View File

@ -40,7 +40,7 @@ APACHE_PORT=80
APACHE_LOG_DIR=/var/log/apache
#NGINX/APACHE
## Check CKAN__ROOT_PATH and CKANEXT__DCAT__BASE_URI. If you don't need to use domain locations, it is better to use the nginx configuration. Leave blank or use the root `/`.
## Check CKAN__ROOT_PATH and CKANEXT__DCAT__BASE_URI and CKANEXT__SCHEMING_DCAT_GEOMETADATA_BASE_URI. If you don't need to use domain locations, it is better to use the nginx configuration. Leave blank or use the root `/`.
PROXY_SERVER_NAME=localhost
PROXY_CKAN_LOCATION=/catalog
PROXY_PYCSW_LOCATION=/csw
@ -97,6 +97,7 @@ CKAN_SYSADMIN_NAME=ckan_admin
CKAN_SYSADMIN_PASSWORD=test1234
CKAN_SYSADMIN_EMAIL=your_email@example.com
CKAN_STORAGE_PATH=/var/lib/ckan
CKAN_LOGS_PATH=/var/log
CKAN_SMTP_SERVER=smtp.corporateict.domain:25
CKAN_SMTP_STARTTLS=True
CKAN_SMTP_USER=user
@ -124,13 +125,15 @@ CKAN__LOCALE_ORDER="en es pt_BR ja it cs_CZ ca fr el sv sr sr@latin no sk fi ru
CKAN__LOCALES_OFFERED="en es pt_BR ja it cs_CZ ca fr el sv sr sr@latin no sk fi ru de pl nl bg ko_KR hu sa sl lv"
# Extensions
CKAN__PLUGINS="envvars stats text_view image_view webpage_view recline_view resourcedictionary datastore xloader harvest ckan_harvester spatial_metadata spatial_query spatial_harvest_metadata_api csw_harvester waf_harvester doc_harvester resource_proxy geo_view geojson_view wmts_view shp_view dcat dcat_rdf_harvester dcat_json_harvester dcat_json_interface scheming_dcat_datasets scheming_dcat_groups scheming_dcat_organizations scheming_dcat pdf_view pages fluent"
CKAN__PLUGINS="envvars stats text_view image_view webpage_view recline_view resourcedictionary datastore xloader harvest ckan_harvester spatial_metadata spatial_query spatial_harvest_metadata_api csw_harvester waf_harvester doc_harvester resource_proxy geo_view geojson_view wmts_view shp_view dcat dcat_rdf_harvester dcat_json_harvester dcat_json_interface scheming_dcat_datasets scheming_dcat_groups scheming_dcat_organizations scheming_dcat scheming_dcat_ckan_harvester scheming_dcat_csw_harvester pdf_view pages fluent"
# ckanext-harvest
CKAN__HARVEST__MQ__TYPE=redis
CKAN__HARVEST__MQ__HOSTNAME=redis
CKAN__HARVEST__MQ__PORT=6379
CKAN__HARVEST__MQ__REDIS_DB=1
# Clean-up mechanism for the harvest log table. The default is 30 days.
CKAN__HARVEST__LOG_TIMEFRAME=40
# ckanext-xloader
CKANEXT__XLOADER__API_TOKEN=api_token

View File

@ -26,7 +26,7 @@
## Overview
Contains Docker images for the different components of CKAN Cloud and a Docker compose environment (based on [ckan](https://github.com/ckan/ckan)) for development and testing Open Data portals.
>**Warning**:<br>
> [!IMPORTANT]
>This is a **custom installation of Docker Compose** with specific extensions for spatial data and [GeoDCAT-AP](https://github.com/SEMICeu/GeoDCAT-AP)/[INSPIRE](https://github.com/INSPIRE-MIF/technical-guidelines) metadata [profiles](https://en.wikipedia.org/wiki/Geospatial_metadata). For official installations, please have a look: [CKAN documentation: Installation](https://docs.ckan.org/en/latest/maintaining/installing/index.html).
![CKAN Docker Platform](/doc/img/ckan-docker-services.png)
@ -69,7 +69,7 @@ The site is configured using environment variables that you can set in the `.env
### ckan-docker roadmap
Information about extensions installed in the `main` image. More info described in the [Extending the base images](#extending-the-base-images)
>**Note**<br>
> [!NOTE]
> Switch branches to see the `roadmap` for other projects: [ckan-docker/branches](https://github.com/mjanez/ckan-docker/branches)
@ -79,7 +79,7 @@ Information about extensions installed in the `main` image. More info described
| Core + | [Datastore](https://github.com/mjanez/ckan-docker) | 2.9.9 | Completed | ✔️ | ✔️ | Stable installation (Production & Dev images) via Docker Compose. |
| Core + | [~~Datapusher~~](https://github.com/mjanez/ckan-docker) | 0.0.19 | Deprecated | ❌ | ❌ | Updated to [xloader](https://github.com/ckan/ckanext-xloader), an express Loader - quickly load data into DataStore. |
| Extension | [ckanext-xloader](https://github.com/ckan/ckanext-xloader) | 1.0.1 | Completed | ✔️ | ✔️ | Stable installation, a replacement for DataPusher because it offers ten times the speed and more robustness |
| Extension | [ckanext-harvest](https://github.com/ckan/ckanext-harvest) | 1.5.1 | Completed | ✔️ | ✔️ | Stable installation, necessary for the implementation of the Collector ([ogc_ckan](#recollector-ckan)) |
| Extension | [ckanext-harvest](https://github.com/ckan/ckanext-harvest) | v1.5.6 | Completed | ✔️ | ✔️ | Stable installation, necessary for the implementation of the Collector ([ogc_ckan](#recollector-ckan)) |
| Extension | [ckanext-geoview](https://github.com/ckan/ckanext-geoview) | 0.0.20 | Completed | ✔️ | ✔️ | Stable installation. |
| Extension | [ckanext-spatial](https://github.com/ckan/ckanext-spatial) | 2.0.0 | Completed | ✔️ | ✔️ | Stable installation, necessary for the implementation of the Collector ([ogc_ckan](#recollector-ckan)) |
| Extension | [ckanext-dcat](https://github.com/mjanez/ckanext-dcat) | 1.1.0 | Completed | ✔️ | ✔️ | Stable installation, include DCAT-AP 2.1 profile compatible with GeoDCAT-AP. |
@ -103,7 +103,7 @@ To upgrade Docker Engine, first run sudo `apt-get update`, then follow the [inst
To verify a successful Docker installation, run `docker run hello-world` and `docker version`. These commands should output
versions for client and server.
>**Note**<br>
> [!NOTE]
> Learn more about [Docker](#docker-basic-commands)/[Docker Compose](#docker-compose-basic-commands) basic commands.
>
@ -128,10 +128,10 @@ Use this if you are a maintainer and will not be making code changes to CKAN or
- **Apache HTTP Server**: Replace the [`.env`](/.env) with the [`/samples/.env.apache.example`](/samples/.env.apache.example) and modify the variables as needed.
>**Note**:<br>
> [!NOTE]
> Please note that when accessing CKAN directly (via a browser) ie: not going through Apache/NGINX you will need to make sure you have "ckan" set up to be an alias to localhost in the local hosts file. Either that or you will need to change the `.env` entry for `CKAN_SITE_URL`
>**Warning**:<br>
> [!WARNING]
> Using the default values on the `.env` file will get you a working CKAN instance. There is a sysadmin user created by default with the values defined in `CKAN_SYSADMIN_NAME` and `CKAN_SYSADMIN_PASSWORD` (`ckan_admin` and `test1234` by default). All ennvars with `API_TOKEN` are automatically regenerated when CKAN is loaded, no editing is required.
>
>**This should be obviously changed before running this setup as a public CKAN instance.**
@ -141,7 +141,7 @@ Use this if you are a maintainer and will not be making code changes to CKAN or
docker compose build
```
>**Note**<br>
> [!NOTE]
> You can use a [deploy in 5 minutes](#quick-mode) if you just want to test the package.
4. Start the containers:
@ -153,11 +153,11 @@ This will start up the containers in the current window. By default the containe
using a different colour. You could also use the -d "detach mode" option ie: `docker compose up -d` if you wished to use the current
window for something else.
>**Note**<br>
> [!NOTE]
> * Or `docker compose up --build` to build & up the containers.
> * Or `docker compose -f docker-compose.apache.yml up -d --build` to use the Apache HTTP Server version.
>**Note**<br>
> [!NOTE]
> Learn more about configuring this ckan docker:
> - [Backup the CKAN Database](#ckan-backups)
> - [Configuring a docker compose service to start on boot](#docker-compose-configure-a-docker-compose-service-to-start-on-boot)
@ -229,7 +229,7 @@ The Docker image config files used to build your CKAN project are located in the
* Any custom changes to the scripts run during container start up can be made to scripts in the `setup/` directory. For instance if you wanted to change the port on which CKAN runs you would need to make changes to the Docker Compose yaml file, and the `start_ckan.sh.override` file. Then you would need to add the following line to the Dockerfile ie: `COPY setup/start_ckan.sh.override ${APP_DIR}/start_ckan.sh`. The `start_ckan.sh` file in the locally built image would override the `start_ckan.sh` file included in the base image
>**Note**<br>
> [!TIP]
> If you get an error like ` doesn't have execute permissions`:
>
>```log
@ -309,7 +309,7 @@ ckan
```
>**Note**:<br>
> [!NOTE]
> Git diff is a command to output the changes between two sources inside the Git repository. The data sources can be two different branches, commits, files, etc.
> * Show changes between working directory and staging area:
> `git diff > [file.patch]`
@ -432,6 +432,12 @@ Available components:
* **pycsw**: The pycsw app. An [OARec](https://ogcapi.ogc.org/records) and [OGC CSW](https://opengeospatial.org/standards/cat) server implementation written in Python.
* **ckan2pycsw**: Software to achieve interoperability with the open data portals based on CKAN. To do this, ckan2pycsw reads data from an instance using the CKAN API, generates ISO-19115/ISO-19139 metadata using [pygeometa](https://geopython.github.io/pygeometa/), or a custom schema that is based on a customized CKAN schema, and populates a [pycsw](https://pycsw.org/) instance that exposes the metadata using CSW and OAI-PMH.
### Harvester consumers on a deployed CKAN
[ckanext-harvest supervisor](https://github.com/ckan/ckanext-harvest#setting-up-the-harvesters-on-a-production-server) allows you to harvest metadata from multiple sources on a production deployment. Here it is deployed [by a worker consumers in the `ckan` container](./ckan/setup/workers/harvester.conf), also the `ckanext-harvest` extension and other custom harvesters ([`ckanext-scheming_dcat`](https://github.com/mjanez/ckanext-scheming_dcat?tab=readme-ov-file#harvesters) or [`ckanext-dcat`](https://github.com/ckan/ckanext-dcat#rdf-dcat-harvester)) are included in the CKAN docker images.
> ![TIP]
> To enable harvesters you need to set up in the `.env` file the `CKAN__PLUGINS` variable with the `harvest` plugin: https://github.com/mjanez/ckan-docker/blob/a18e0c80d9f16b6d9b6471e3148d48fcb83712bd/.env.example#L126-L127
## ckan-docker tips
### CKAN. Backups
@ -474,7 +480,7 @@ PostgreSQL offers the command line tools [`pg_dump`](https://www.postgresql.org/
- `your_postgres_password`: The password for the PostgreSQL user.
- `/path/to/your/backup/directory`: The path to the directory where you want to store the backup files.
>**Warning**<br>
> [!WARNING]
> If you have changed the values of the PostgreSQL container, database or user, change them too.
> Check that `zip` package is installed: `sudo apt-get install zip`
@ -498,14 +504,14 @@ PostgreSQL offers the command line tools [`pg_dump`](https://www.postgresql.org/
0 0 * * * /path/to/your/script/ckan_backup_custom.sh
```
>**Info**<br>
> [!NOTE]
> Replace `/path/to/your/script` with the actual path to the `ckan_backup_custom.sh` script.
8. Save and close the file.
The cronjob is now set up and will backup your CKAN PostgreSQL database daily at midnight using the custom format. The backups will be stored in the specified directory with the timestamp in the filename.
>**Info**<br>
> [!NOTE]
> Sample scripts for backing up CKAN: [`doc/scripts`](doc/scripts)
@ -691,7 +697,7 @@ To have Docker Compose run automatically when you reboot a machine, you can foll
## CKAN API
>**Note**<br>
> [!NOTE]
>`params`: Parameters to pass to the action function. The parameters are specific to each action function.
>* `fl` (text): Fields of the dataset to return. The parameter controls which fields are returned in the solr query. `fl` can be `None` or a list of result fields, such as: `id,name,extras_custom_schema_field`.
>

View File

@ -11,29 +11,30 @@ WORKDIR ${APP_DIR}
# requirements.txt files fixed until next releases
COPY req_fixes req_fixes
# Extensions
### XLoader - 1.0.1 ###
### Harvester - v1.5.1 ###
### Geoview - v0.0.20 ###
### Spatial - v2.1.1 ###
### DCAT - v1.2.0-geodcatap (GeoDCAT-AP/NTI-RISP extended version) ###
### Scheming - release-3.0.0 ###
### Resource dictionary - v1.0.1 ###
### Pages - v0.5.2 ###
### PDFView - 0.0.8 ###
### Fluent - v1.0.1 (Forked stable version) ###
### Scheming DCAT - v2.0.0 (GeoDCAT-AP/NTI-RISP extended version) ###
### SPARQL Interface - 2.0.1 ###
# CKAN configuration & extensions
## XLoader - 1.0.1 ##
## Harvest - v1.5.6 (Worker with supervisor) ##
## Geoview - v0.0.20 ##
## Spatial - v2.1.1 ##
## DCAT - v1.2.0-geodcatap (GeoDCAT-AP/NTI-RISP extended version) ##
## Scheming - release-3.0.0 ##
## Resource dictionary - v1.0.1 ##
## Pages - v0.5.2 ##
## PDFView - 0.0.8 ##
## Fluent - v1.0.1 (Forked stable version) ##
## Scheming DCAT - v2.0.0 (GeoDCAT-AP/NTI-RISP extended version) ##
RUN echo ${TZ} > /etc/timezone && \
if ! [ /usr/share/zoneinfo/${TZ} -ef /etc/localtime ]; then cp /usr/share/zoneinfo/${TZ} /etc/localtime ; fi && \
if ! [ /usr/share/zoneinfo/${TZ} -ef /etc/localtime ]; then cp /usr/share/zoneinfo/${TZ} /etc/localtime; fi && \
# Remove apk cache
rm -rf /var/cache/apk/* && \
# Install CKAN extensions
echo "ckan/ckanext-xloader" && \
pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-xloader.git@1.0.1#egg=ckanext-xloader && \
pip3 install --no-cache-dir -r ${APP_DIR}/src/ckanext-xloader/requirements.txt && \
pip3 install --no-cache-dir -U requests[security] && \
echo "ckan/ckanext-harvest" && \
pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-harvest.git@v1.5.1#egg=ckanext-harvest && \
pip3 install --no-cache-dir -r ${APP_DIR}/src/ckanext-harvest/pip-requirements.txt && \
pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-harvest.git@v1.5.6#egg=ckanext-harvest && \
pip3 install --no-cache-dir -r ${APP_DIR}/src/ckanext-harvest/requirements.txt && \
echo "ckan/ckanext-geoview" && \
pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-geoview.git@v0.0.20#egg=ckanext-geoview && \
echo "ckan/ckanext-spatial" && \
@ -66,11 +67,20 @@ COPY setup/who.ini ./
COPY patches patches
RUN for d in $APP_DIR/patches/*; do \
if [ -d $d ]; then \
if [ -d $d ]; then \
for f in `ls $d/*.patch | sort -g`; do \
cd $SRC_DIR/`basename "$d"` && echo "$0: Applying patch $f to $SRC_DIR/`basename $d`"; patch -p1 < "$f" ; \
done ; \
fi ; \
cd $SRC_DIR/`basename "$d"` && echo "$0: Applying patch $f to $SRC_DIR/`basename $d`" && patch -p1 < "$f"; \
done; \
fi; \
done
# Workers
## Update start_ckan.sh with custom workers
COPY setup/start_ckan.sh.override ${APP_DIR}/start_ckan.sh
RUN chmod +x ${APP_DIR}/start_ckan.sh
## Harvester
COPY setup/workers/harvester.conf /etc/supervisord.d/harvester.conf
# Start CKAN
CMD ["/bin/sh", "-c", "$APP_DIR/start_ckan.sh"]

View File

@ -1,14 +1,18 @@
FROM ghcr.io/mjanez/ckan-base-spatial:ckan-2.9.9-dev
LABEL maintainer="mnl.janez@gmail.com"
# Set up environment variables
ENV APP_DIR=/srv/app \
TZ=UTC \
SRC_EXTENSIONS_DIR=/srv/app/src_extensions
# Set working directory
WORKDIR ${APP_DIR}
RUN echo ${TZ} > /etc/timezone && \
set -ex && apk --no-cache add sudo && \
# Make sure both files are not exactly the same
if ! [ /usr/share/zoneinfo/${TZ} -ef /etc/localtime ]; then cp /usr/share/zoneinfo/${TZ} /etc/localtime ; fi
if ! [ /usr/share/zoneinfo/${TZ} -ef /etc/localtime ]; then cp /usr/share/zoneinfo/${TZ} /etc/localtime; fi && \
apk --no-cache add sudo && \
# Remove apk cache
rm -rf /var/cache/apk/*
# Install any extensions needed by your CKAN instance
# - Make sure to add the plugins to CKAN__PLUGINS in the .env file
@ -50,26 +54,29 @@ RUN echo ${TZ} > /etc/timezone && \
COPY docker-entrypoint.d/* /docker-entrypoint.d/
# Update who.ini with PROXY_CKAN_LOCATION
COPY setup/who.ini ${APP_DIR}/
COPY setup/who.ini ./
# Override start_ckan.sh with DEV sh
COPY setup/start_ckan_development.sh.override ${APP_DIR}/start_ckan_development.sh
RUN chmod +x ${APP_DIR}/start_ckan_development.sh
COPY setup/start_ckan_development.sh.override ./start_ckan_development.sh
RUN chmod +x ./start_ckan_development.sh
## Harvester
COPY setup/workers/harvester.conf /etc/supervisord.d/harvester.conf
# Apply any patches needed to CKAN core or any of the built extensions (not the
# runtime mounted ones)
COPY patches ${APP_DIR}/patches
COPY patches patches
RUN for d in $APP_DIR/patches/*; do \
if [ -d $d ]; then \
for f in `ls $d/*.patch | sort -g`; do \
if [ -d $SRC_DIR/`basename "$d"` ]; then \
cd $SRC_DIR/`basename "$d"` && \
echo "$0: Applying patch $f to $SRC_DIR/`basename $d`" && \
patch -p1 < "$f" ; \
else \
echo "$0: Skipping patch $f because directory $SRC_DIR/`basename $d` does not exist. Built the extension: `basename $d`" ; \
fi \
done ; \
fi ; \
done
if [ -d $d ]; then \
for f in `ls $d/*.patch | sort -g`; do \
if [ -d $SRC_DIR/`basename "$d"` ]; then \
cd $SRC_DIR/`basename "$d"` && \
echo "$0: Applying patch $f to $SRC_DIR/`basename $d`" && \
patch -p1 < "$f" ; \
else \
echo "$0: Skipping patch $f because directory $SRC_DIR/`basename $d` does not exist. Built the extension: `basename $d`" ; \
fi \
done ; \
fi ; \
done

View File

@ -1,7 +1,7 @@
#!/bin/bash
# Update who.ini when exists PROXY_CKAN_LOCATION
echo "Update who.ini"
echo "[docker-entrypoint.00_update_who] Update who.ini"
if [ -n "$PROXY_CKAN_LOCATION" ] && [ "$PROXY_CKAN_LOCATION" != "/" ]; then
sed -i "s|\${WHO_LOCATION}|$PROXY_CKAN_LOCATION|g" "${APP_DIR}/who.ini";
else

View File

@ -11,20 +11,20 @@ for TOKEN_ID in $TOKEN_IDS
do
ckan -c $CKAN_INI user token revoke $TOKEN_ID
if [ $? -eq 0 ]; then
echo "API Token $TOKEN_ID has been revoked"
echo "[docker-entrypoint.01_setup_xloader] API Token $TOKEN_ID has been revoked"
fi
done
# Add ckanext.xloader.api_token to the CKAN config file
echo "Loading ckanext-xloader settings in the CKAN config file"
echo "[docker-entrypoint.01_setup_xloader] Loading ckanext-xloader settings in the CKAN config file"
ckan config-tool $CKAN_INI \
"ckanext.xloader.api_token=xxx" \
"ckanext.xloader.jobs_db.uri=$CKANEXT__XLOADER__JOBS__DB_URI"
# Create ckanext-xloader API_TOKEN
echo "Set up ckanext.xloader.api_token in the CKAN config file"
echo "[docker-entrypoint.01_setup_xloader] Set up ckanext.xloader.api_token in the CKAN config file"
ckan config-tool $CKAN_INI "ckanext.xloader.api_token=$(ckan -c $CKAN_INI user token add ckan_admin xloader | tail -n 1 | tr -d '\t')"
#TODO: Setup worker background
#echo "Set up CKAN jobs worker"
#echo "[docker-entrypoint.01_setup_xloader] Set up CKAN jobs worker"
#ckan -c $CKAN_INI jobs worker default

View File

@ -1,10 +1,10 @@
#!/bin/bash
# Update ckanext-scheming and ckanext-scheming_dcat settings defined in the env var
echo "Set up ckanext-scheming_dcat. Clear index"
echo "[docker-entrypoint.02_setup_scheming] Clear index"
ckan -c $CKAN_INI search-index clear
echo "Loading ckanext-scheming and ckanext-scheming_dcat settings into ckan.ini"
echo "[docker-entrypoint.02_setup_scheming] Loading ckanext-scheming and ckanext-scheming_dcat settings into ckan.ini"
ckan config-tool $CKAN_INI \
"scheming.dataset_schemas=$CKANEXT__SCHEMING_DCAT_DATASET_SCHEMA" \
"scheming.group_schemas=$CKANEXT__SCHEMING_DCAT_GROUP_SCHEMAS" \
@ -15,5 +15,5 @@ ckan config-tool $CKAN_INI \
"scheming_dcat.group_custom_facets=$CKANEXT__SCHEMING_DCAT_GROUP_CUSTOM_FACETS" \
"scheming_dcat.geometadata_base_uri=$CKANEXT__SCHEMING_DCAT_GEOMETADATA_BASE_URI"
echo "ckanext-scheming_dcat. Rebuild index"
echo "[docker-entrypoint.02_setup_scheming] Rebuild index"
ckan -c $CKAN_INI search-index rebuild

View File

@ -1,7 +1,7 @@
#!/bin/bash
# Add ckanext-dcat settings to the CKAN config file
echo "Loading ckanext-dcat settings in the CKAN config file"
echo "[docker-entrypoint.03_setup_dcat] Loading ckanext-dcat settings in the CKAN config file"
ckan config-tool $CKAN_INI \
"ckanext.dcat.base_uri = $CKANEXT__DCAT__BASE_URI" \
"ckanext.dcat.catalog_endpoint = $CKANEXT__DCAT__DEFAULT_CATALOG_ENDPOINT" \

View File

@ -3,7 +3,7 @@
#TODO: Correct views.
# Add CKAN Resource views to the CKAN config file
echo "Loading resource views in the CKAN config file"
echo "[docker-entrypoint.04_setup_preview] Loading resource views in the CKAN config file"
ckan config-tool $CKAN_INI \
"ckan.views.default_views = $CKAN__VIEWS__DEFAULT_VIEWS" \
"ckan.preview.json_formats = $CKAN__PREVIEW__JSON_FORMATS" \
@ -12,7 +12,7 @@ ckan config-tool $CKAN_INI \
"ckan.preview.loadable = $CKAN__PREVIEW__LOADABLE"
# Add CKAN Resource geoviews to the CKAN config file
echo "Loading geoviews in the CKAN config file"
echo "[docker-entrypoint.04_setup_preview] Loading geoviews in the CKAN config file"
ckan config-tool $CKAN_INI \
"ckanext.geoview.ol_viewer.formats = $CKANEXT__GEOVIEW__OL_VIEWER__FORMATS" \
"ckanext.geoview.shp_viewer.srid = $CKANEXT__GEOVIEW__SHP_VIEWER__SRID" \

View File

@ -1,7 +1,7 @@
#!/bin/bash
# Add pages CKAN config file (https://github.com/ckan/ckanext-pages#configuration)
echo "Loading pages config in the CKAN config file"
echo "[docker-entrypoint.05_setup_pages] Loading pages config in the CKAN config file"
ckan config-tool $CKAN_INI \
"ckan.pages.allow_html = $CKANEXT__PAGES__ALOW_HTML" \
"ckanext.pages.organization = $CKANEXT__PAGES__ORGANIZATION" \

View File

@ -0,0 +1,72 @@
diff --git a/ckanext/harvest/templates/source/new.html b/ckanext/harvest/templates/source/new.html
index b7feb3d..b773a44 100644
--- a/ckanext/harvest/templates/source/new.html
+++ b/ckanext/harvest/templates/source/new.html
@@ -24,12 +24,18 @@
<div class="module-content">
<p>
{% trans %}
- Harvest sources allow importing remote metadata into this catalog.
- Remote sources can be other catalogs such as other CKAN instances, CSW
- servers or Web Accessible Folders (WAF) (depending on the actual
- harvesters enabled for this instance).
+ Harvest sources allow importing remote metadata into this catalog. Remote sources can be other catalogs such as other CKAN instances, CSW servers, XML metadata files, XLSX with metadata records or Web Accessible Folder (WAF).
{% endtrans %}
</p>
+
+ <p>
+ {{ _('Depending on the actual harvesters enabled for this instance. eg: ') }}
+ <ul>
+ <li><a href="https://github.com/mjanez/ckanext-scheming_dcat?tab=readme-ov-file#harvesters" target="_blank">ckanext-scheming_dcat</a></li>
+ <li><a href="https://github.com/ckan/ckanext-dcat?tab=readme-ov-file#rdf-dcat-harvester" target="_blank">ckanext-dcat</a></li>
+ <li><a href="https://docs.ckan.org/projects/ckanext-spatial/en/latest/harvesters.html" target="_blank">ckanext-scheming_spatial</a></li>
+ </ul>
+ </p>
</div>
</section>
{% endblock %}
diff --git a/ckanext/harvest/templates/source/new_source_form.html b/ckanext/harvest/templates/source/new_source_form.html
index 324d012..8a500b9 100644
--- a/ckanext/harvest/templates/source/new_source_form.html
+++ b/ckanext/harvest/templates/source/new_source_form.html
@@ -26,7 +26,7 @@
{{ form.markdown('notes', id='field-notes', label=_('Description'), value=data.notes, error=errors.notes) }}
<div class="harvest-types form-group control-group">
- <label class="control-label">Source type</label>
+ <label class="control-label">{{ _('Source type') }}</label>
<div class="controls">
{% for harvester in h.harvesters_info() %}
{% set checked = False %}
diff --git a/ckanext/harvest/templates/source/search.html b/ckanext/harvest/templates/source/search.html
index a929943..06cb373 100644
--- a/ckanext/harvest/templates/source/search.html
+++ b/ckanext/harvest/templates/source/search.html
@@ -44,7 +44,26 @@
-{% block secondary_content %}
+ {% block secondary_content %}
+ <section class="module module-narrow">
+ <h2 class="module-heading"><i class="fa fa-lg fa-info-circle icon-large icon-info-sign"></i> {{ _('Harvest sources') }}</h2>
+ <div class="module-content">
+ <p>
+ {% trans %}
+ Harvest sources allow importing remote metadata into this catalog. Remote sources can be other catalogs such as other CKAN instances, CSW servers, XML metadata files, XLSX with metadata records or Web Accessible Folder (WAF).
+ {% endtrans %}
+ </p>
+
+ <p>
+ {{ _('Depending on the actual harvesters enabled for this instance. eg: ') }}
+ <ul>
+ <li><a href="https://github.com/mjanez/ckanext-scheming_dcat?tab=readme-ov-file#harvesters" target="_blank">ckanext-scheming_dcat</a></li>
+ <li><a href="https://github.com/ckan/ckanext-dcat?tab=readme-ov-file#rdf-dcat-harvester" target="_blank">ckanext-dcat</a></li>
+ <li><a href="https://docs.ckan.org/projects/ckanext-spatial/en/latest/harvesters.html" target="_blank">ckanext-scheming_spatial</a></li>
+ </ul>
+ </p>
+ </div>
+ </section>
{% for facet in c.facet_titles %}
{{ h.snippet('snippets/facet_list.html', title=c.facet_titles[facet], name=facet, alternative_url=h.url_for('{0}.search'.format(c.dataset_type))) }}
{% endfor %}

View File

@ -1,8 +1,5 @@
#!/bin/sh
# Add ckan.datapusher.api_token to the CKAN config file (updated with corrected value later)
ckan config-tool $CKAN_INI ckan.datapusher.api_token=xxx
# Set up the Secret key used by Beaker and Flask
# This can be overriden using a CKAN___BEAKER__SESSION__SECRET env var
if grep -E "beaker.session.secret ?= ?$" ckan.ini
@ -16,7 +13,7 @@ then
fi
# Run the prerun script to init CKAN and create the default admin user
sudo -u ckan -EH python3 prerun.py
python3 prerun.py
# Run any startup scripts provided by images extending this one
if [[ -d "/docker-entrypoint.d" ]]
@ -31,6 +28,14 @@ then
done
fi
# Create Harvester logs directory and change its ownership
mkdir -p $CKAN_LOGS_PATH/harvester
chown -R ckan:ckan $CKAN_LOGS_PATH/harvester
# Create xloader logs directory and change its ownership
mkdir -p $CKAN_LOGS_PATH/xloader
chown -R ckan:ckan $CKAN_LOGS_PATH/xloader
# Set the common uwsgi options
UWSGI_OPTS="--plugins http,python \
--socket /tmp/uwsgi.sock \
@ -48,7 +53,15 @@ then
# Start supervisord
supervisord --configuration /etc/supervisord.conf &
# Start uwsgi
sudo -u ckan -EH uwsgi $UWSGI_OPTS
uwsgi $UWSGI_OPTS
else
echo "[prerun] failed...not starting CKAN."
fi
fi
# Workers
## Start the Harvester worker
echo "[prerun.workers] Starting the CKAN Harvester worker"
ckan harvester run
## Add harvester to crontab
crontab -l | { cat; echo "0 */2 * * * ckan harvester run"; } | crontab -
## Clean-up mechanism for the harvest log table. 'ckan.harvest.log_timeframe'. The default time frame is 30 days
crontab -l | { cat; echo "0 5 * * * ckan harvester clean-harvest-log"; } | crontab -

View File

@ -45,9 +45,6 @@ done
echo "Enabling debug mode"
ckan config-tool $CKAN_INI -s DEFAULT "debug = true"
# Add ckan.datapusher.api_token to the CKAN config file (updated with corrected value later)
ckan config-tool $CKAN_INI ckan.datapusher.api_token=xxx
# Set up the Secret key used by Beaker and Flask
# This can be overriden using a CKAN___BEAKER__SESSION__SECRET env var
if grep -E "beaker.session.secret ?= ?$" ckan.ini
@ -74,7 +71,7 @@ ckan config-tool $SRC_DIR/ckan/test-core.ini \
"ckan.redis.url = $TEST_CKAN_REDIS_URL"
# Run the prerun script to init CKAN and create the default admin user
sudo -u ckan -EH python3 prerun.py
python3 prerun.py
# Run any startup scripts provided by images extending this one
if [[ -d "/docker-entrypoint.d" ]]
@ -89,8 +86,24 @@ then
done
fi
# Create Harvester logs directory and change its ownership
mkdir -p $CKAN_LOGS_PATH/harvester
chown -R ckan:ckan $CKAN_LOGS_PATH/harvester
# Create xloader logs directory and change its ownership
mkdir -p $CKAN_LOGS_PATH/xloader
chown -R ckan:ckan $CKAN_LOGS_PATH/xloader
# Start supervisord
supervisord --configuration /etc/supervisord.conf &
# Start the development server with automatic reload
sudo -u ckan -EH ckan -c $CKAN_INI run -H 0.0.0.0
# Workers
## Start the Harvester worker
echo "[prerun.workers] Starting the CKAN Harvester worker"
ckan harvester run
## Clean-up mechanism for the harvest log table
ckan harvester clean-harvest-log
# Start the development server as the ckan user with automatic reload
su ckan -c "/usr/bin/ckan -c $CKAN_INI run -H 0.0.0.0"

View File

@ -0,0 +1,19 @@
[program:ckan_gather_consumer]
command=ckan harvester gather-consumer
user=ckan
numprocs=1
stdout_logfile=/var/log/harvester/gather_consumer.log
stderr_logfile=/var/log/harvester/gather_consumer.err.log
autostart=true
autorestart=true
startsecs=10
[program:ckan_fetch_consumer]
command=ckan harvester fetch-consumer
user=ckan
numprocs=1
stdout_logfile=/var/log/harvester/fetch_consumer.log
stderr_logfile=/var/log/harvester/fetch_consumer.err.log
autostart=true
autorestart=true
startsecs=10

View File

@ -3,6 +3,7 @@ version: "3"
volumes:
ckan_storage:
ckan_logs:
pg_data:
solr_data:
@ -57,6 +58,7 @@ services:
condition: service_healthy
volumes:
- ckan_storage:/var/lib/ckan
- ckan_logs:/var/log
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-qO", "/dev/null", "http://localhost:${CKAN_PORT}"]

View File

@ -89,6 +89,7 @@ CKAN_SYSADMIN_NAME=ckan_admin
CKAN_SYSADMIN_PASSWORD=test1234
CKAN_SYSADMIN_EMAIL=your_email@example.com
CKAN_STORAGE_PATH=/var/lib/ckan
CKAN_LOGS_PATH=/var/log
CKAN_SMTP_SERVER=smtp.corporateict.domain:25
CKAN_SMTP_STARTTLS=True
CKAN_SMTP_USER=user

View File

@ -97,6 +97,7 @@ CKAN_SYSADMIN_NAME=ckan_admin
CKAN_SYSADMIN_PASSWORD=test1234
CKAN_SYSADMIN_EMAIL=your_email@example.com
CKAN_STORAGE_PATH=/var/lib/ckan
CKAN_LOGS_PATH=/var/log
CKAN_SMTP_SERVER=smtp.corporateict.domain:25
CKAN_SMTP_STARTTLS=True
CKAN_SMTP_USER=user

View File

@ -89,6 +89,7 @@ CKAN_SYSADMIN_NAME=ckan_admin
CKAN_SYSADMIN_PASSWORD=test1234
CKAN_SYSADMIN_EMAIL=your_email@example.com
CKAN_STORAGE_PATH=/var/lib/ckan
CKAN_LOGS_PATH=/var/log
CKAN_SMTP_SERVER=smtp.corporateict.domain:25
CKAN_SMTP_STARTTLS=True
CKAN_SMTP_USER=user

View File

@ -97,6 +97,7 @@ CKAN_SYSADMIN_NAME=ckan_admin
CKAN_SYSADMIN_PASSWORD=test1234
CKAN_SYSADMIN_EMAIL=your_email@example.com
CKAN_STORAGE_PATH=/var/lib/ckan
CKAN_LOGS_PATH=/var/log
CKAN_SMTP_SERVER=smtp.corporateict.domain:25
CKAN_SMTP_STARTTLS=True
CKAN_SMTP_USER=user