Merge pull request #21 from keitaroinc/restructure-layout

Restructure the layout and add compose and examples
This commit is contained in:
Благоја Стојкоски 2020-09-11 14:55:48 +02:00 committed by GitHub
commit c30da16a5d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
38 changed files with 2465 additions and 58 deletions

121
Readme.md
View File

@ -1,58 +1,93 @@
# Docker ckan image ![Docker Pulls](https://img.shields.io/docker/pulls/keitaro/ckan.svg) # Dockerized CKAN ![Docker Pulls](https://img.shields.io/docker/pulls/keitaro/ckan.svg)
This repository contains base docker images, examples and docker-compose used to build and run CKAN.
We build and publish docker images built using this repository to Dockerhub:
- [CKAN docker images](https://hub.docker.com/r/keitaro/ckan).
- [Datapusher docker images](https://hub.docker.com/r/keitaro/ckan-datapusher)
Looking to run CKAN on Kubernetes? Check out our [CKAN Helm Chart](https://github.com/keitaroinc/ckan-helm)!
## Overview ## Overview
All images are based on [Alpine Linux](https://alpinelinux.org/) and include only required extensions to start a CKAN instance. The docker images are built using a multi-stage docker approach in order to produce slim production grade docker images with the right libraries and configuration. This multi-stage approach allows us to build python binary wheels in the build stages that later on we install in the main stage.
This repository contains base docker image used to build CKAN instances. It's based on [Alpine Linux](https://alpinelinux.org/) and includes only required extensions to start CKAN instance. Directory layout:
- [compose](./compose) - contains a docker-compose setup allowing users to spin up a CKAN setup easily using [docker-compose](https://docs.docker.com/compose/)
- [images](./images) - includes docker contexts for building all supported CKAN versions and datapusher
- [examples](./examples) - includes examples on how to extend the CKAN docker images and how to run them
## Running CKAN using docker-compose
To start CKAN using docker-compose, simply change into the *compose* directory and run
```sh
cd compose
docker-compose build
docker-compose up
```
Check if CKAN was succesfuly started on http://localhost:5000.
### Configuration
In order to configure CKAN within docker-compose we use both build/up time variables loaded via the [.env](./compose/.env) file, and runtime variables loaded via the [.ckan-env](./compose/.ckan-env) file.
Variables in the [.env](./compose/.env) file are loaded when running `docker-compose build` and `docker-compose up`, while variables in [.ckan-env](./compose/.ckan-env) file are used withing the CKAN container at runtime to configure CKAN and CKAN extensions using [ckanext-envvars](https://github.com/okfn/ckanext-envvars).
## Extending CKAN docker images
Check some examples of extending CKAN docker images in the [examples](./examples) directory.
We recommend to use a multi-stage approach to extend the docker images that we provide here. To extend the images the following Dockerfile structure is recommended:
```docker
###################
### Extensions ####
###################
FROM keitaro/ckan:2.9.0 as extbuild
# Switch to the root user
USER root
# Install any system packages necessary to build extensions
RUN apk add --no-cache python3-dev
# Fetch and build the custom CKAN extensions
RUN pip wheel --wheel-dir=/wheels git+https://github.com/acmecorp/ckanext-acme@0.0.1#egg=ckanext-acme
############
### MAIN ###
############
FROM keitaro/ckan:2.9.0
# Add the custom extensions to the plugins list
ENV CKAN__PLUGINS envvars image_view text_view recline_view datastore datapusher acme
# Switch to the root user
USER root
COPY --from=extbuild /wheels /srv/app/ext_wheels
# Install and enable the custom extensions
RUN pip install --no-index --find-links=/srv/app/ext_wheels ckanext-acme && \
ckan -c ${APP_DIR}/production.ini config-tool "ckan.plugins = ${CKAN__PLUGINS}" && \
chown -R ckan:ckan /srv/app
# Remove wheels
RUN rm -rf /srv/app/ext_wheels
# Switch to the ckan user
USER ckan
```
### Adding prerun scripts
You can add scripts to CKAN custom images and copy them to the *docker-entrypoint.d* directory. Any *.sh or *.py file in that directory will be executed after the main initialization script (prerun.py) is executed.
## Build ## Build
To build a CKAN image run:
To create new image run:
```sh ```sh
docker build --tag ckan-2.8.2 . docker build --tag keitaro/ckan:2.9.0 images/ckan/2.9
``` ```
The -tag ckan-2.8.2 flag sets the image name to ckan-2.8.2 and the dot ( “.” ) at the end tells docker build to look into the current directory for Dockerfile and related contents. The -tag keitaro/ckan:2.9.0 flag sets the image name to ketiaro/ckan:2.9.0 and 'images/ckan/2.9' at the end tells docker build to use the context into the specified directory where the Dockerfile and related contents are.
## List
Check if the image shows up in the list of images:
```sh
docker images
```
## Run
To start and test newly created image run:
```sh
docker run ckan-2.8.2
```
Check if CKAN was succesfuly started on http://localhost:5000. The ckan site url is configured in ENV CKAN_SITE_URL.
## Upload to DockerHub ## Upload to DockerHub
>*It's recommended to upload built images to DockerHub* >*It's recommended to upload built images to DockerHub*
To upload the image to DockerHub run: To upload the image to DockerHub run:
```sh ```sh
docker push [options] <docker-hub>/ckan:<image-tag> docker push [options] <docker-hub-namespace>/ckan:<image-tag>
``` ```
## Upgrade
To upgrade the Docker file to use new CKAN version, in the Dockerfile you should change:
>ENV GIT_BRANCH={ckan_release}
Check [CKAN repository](https://github.com/ckan/ckan/releases) for the latest releases.
If there are new libraries used by the new version requirements, those needs to be included too.
## Extensions
Default extensions used in the Dockerfile are kept in:
>ENV CKAN__PLUGINS envvars image_view text_view recline_view datastore datapusher
## Add new scripts
You can add scripts to CKAN custom images and copy them to the *docker-entrypoint.d* directory. Any *.sh or *.py file in that directory will be executed after the main initialization script (prerun.py) is executed.

25
compose/.ckan-env Normal file
View File

@ -0,0 +1,25 @@
# Runtime configuration of CKAN enabled through ckanext-envvars
# Information about how it works: https://github.com/okfn/ckanext-envvars
# Note that variables here take presedence over build/up time variables in .env
# General Settings
CKAN_SITE_ID=default
CKAN_SITE_URL=http://localhost:5000
CKAN_PORT=5000
CKAN_MAX_UPLOAD_SIZE_MB=10
# CKAN Plugins
CKAN__PLUGINS=envvars image_view text_view recline_view datastore datapusher
# CKAN requires storage path to be set in order for filestore to be enabled
CKAN__STORAGE_PATH=/srv/app/data
CKAN__WEBASSETS__PATH=/srv/app/data/webassets
# SYSADMIN settings, a sysadmin user is created automatically with the below credentials
CKAN_SYSADMIN_NAME=sysadmin
CKAN_SYSADMIN_PASSWORD=password
CKAN_SYSADMIN_EMAIL=sysadmin@ckantest.com
# Email settings
CKAN_SMTP_SERVER=smtp.corporateict.domain:25
CKAN_SMTP_STARTTLS=True
CKAN_SMTP_USER=user
CKAN_SMTP_PASSWORD=pass
CKAN_SMTP_MAIL_FROM=ckan@localhost

29
compose/.env Normal file
View File

@ -0,0 +1,29 @@
# Variables in this file will be used as build arguments when running
# docker-compose build and docker-compose up
# Verify correct substitution with "docker-compose config"
# If variables are newly added or enabled, please delete and rebuild the images to pull in changes:
# docker-compose down -v
# docker-compose build
# docker-compose up -d
# Database
POSTGRES_PASSWORD=ckan
POSTGRES_PORT=5432
DATASTORE_READONLY_PASSWORD=datastore
# CKAN
CKAN_VERSION=2.9.0
CKAN_SITE_ID=default
CKAN_SITE_URL=http://localhost:5000
CKAN_PORT=5000
CKAN_MAX_UPLOAD_SIZE_MB=10
# Datapusher
DATAPUSHER_VERSION=0.0.17
DATAPUSHER_MAX_CONTENT_LENGTH=10485760
DATAPUSHER_CHUNK_SIZE=16384
DATAPUSHER_CHUNK_INSERT_ROWS=250
DATAPUSHER_DOWNLOAD_TIMEOUT=30
# Redis
REDIS_VERSION=6.0.7

View File

@ -0,0 +1,88 @@
# docker-compose build && docker-compose up -d
version: "3"
volumes:
ckan_data:
pg_data:
solr_data:
services:
ckan:
container_name: ckan
image: keitaro/ckan:${CKAN_VERSION}
networks:
- frontend
- backend
depends_on:
- db
ports:
- "0.0.0.0:${CKAN_PORT}:5000"
env_file:
- ./.ckan-env
environment:
- CKAN_SQLALCHEMY_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/ckan
- CKAN_DATASTORE_WRITE_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/datastore
- CKAN_DATASTORE_READ_URL=postgresql://datastore_ro:${DATASTORE_READONLY_PASSWORD}@db/datastore
- CKAN_SOLR_URL=http://solr:8983/solr/ckan
- CKAN_REDIS_URL=redis://redis:6379/1
- CKAN_DATAPUSHER_URL=http://datapusher:8000
- CKAN_SITE_URL=${CKAN_SITE_URL}
- CKAN_MAX_UPLOAD_SIZE_MB=${CKAN_MAX_UPLOAD_SIZE_MB}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
volumes:
- ckan_data:/srv/app/data
datapusher:
container_name: datapusher
image: keitaro/ckan-datapusher:${DATAPUSHER_VERSION}
networks:
- frontend
- backend
ports:
- "8000:8000"
environment:
- DATAPUSHER_MAX_CONTENT_LENGTH=${DATAPUSHER_MAX_CONTENT_LENGTH}
- DATAPUSHER_CHUNK_SIZE=${DATAPUSHER_CHUNK_SIZE}
- DATAPUSHER_CHUNK_INSERT_ROWS=${DATAPUSHER_CHUNK_INSERT_ROWS}
- DATAPUSHER_DOWNLOAD_TIMEOUT=${DATAPUSHER_DOWNLOAD_TIMEOUT}
db:
container_name: db
build:
context: .
dockerfile: postgresql/Dockerfile
args:
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
networks:
- backend
environment:
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- pg_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "ckan"]
solr:
container_name: solr
build:
context: .
dockerfile: solr/Dockerfile
args:
- CKAN_VERSION=${CKAN_VERSION}
networks:
- backend
volumes:
- solr_data:/opt/solr/server/solr/ckan/data
redis:
container_name: redis
image: redis:${REDIS_VERSION}
networks:
- backend
networks:
frontend:
backend:

View File

@ -0,0 +1,13 @@
FROM mdillon/postgis:11
# Allow connections; we don't map out any ports so only linked docker containers can connect
RUN echo "host all all 0.0.0.0/0 md5" >> /var/lib/postgresql/data/pg_hba.conf
# Customize default user/pass/db
ENV POSTGRES_DB ckan
ENV POSTGRES_USER ckan
ARG POSTGRES_PASSWORD
ARG DS_RO_PASS
# Include datastore setup scripts
ADD ./postgresql/docker-entrypoint-initdb.d /docker-entrypoint-initdb.d

View File

@ -0,0 +1,8 @@
#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" <<-EOSQL
CREATE ROLE datastore_ro NOSUPERUSER NOCREATEDB NOCREATEROLE LOGIN PASSWORD '$DS_RO_PASS';
CREATE DATABASE datastore OWNER ckan ENCODING 'utf-8';
GRANT ALL PRIVILEGES ON DATABASE datastore TO ckan;
EOSQL

View File

@ -0,0 +1,3 @@
CREATE EXTENSION POSTGIS;
ALTER VIEW geometry_columns OWNER TO ckan;
ALTER TABLE spatial_ref_sys OWNER TO ckan;

34
compose/solr/Dockerfile Normal file
View File

@ -0,0 +1,34 @@
FROM solr:6.6.6
# Enviroment
ENV SOLR_CORE ckan
ENV SOLR_VERSION 6.6.6
# Build Arguments
ARG CKAN_VERSION
# Create Directories
RUN mkdir -p /opt/solr/server/solr/$SOLR_CORE/conf
RUN mkdir -p /opt/solr/server/solr/$SOLR_CORE/data
# Adding Files
ADD ./solr/solrconfig-$CKAN_VERSION.xml \
https://raw.githubusercontent.com/ckan/ckan/ckan-$CKAN_VERSION/ckan/config/solr/schema.xml \
https://raw.githubusercontent.com/apache/lucene-solr/releases/lucene-solr/$SOLR_VERSION/solr/server/solr/configsets/basic_configs/conf/currency.xml \
https://raw.githubusercontent.com/apache/lucene-solr/releases/lucene-solr/$SOLR_VERSION/solr/server/solr/configsets/basic_configs/conf/synonyms.txt \
https://raw.githubusercontent.com/apache/lucene-solr/releases/lucene-solr/$SOLR_VERSION/solr/server/solr/configsets/basic_configs/conf/stopwords.txt \
https://raw.githubusercontent.com/apache/lucene-solr/releases/lucene-solr/$SOLR_VERSION/solr/server/solr/configsets/basic_configs/conf/protwords.txt \
https://raw.githubusercontent.com/apache/lucene-solr/releases/lucene-solr/$SOLR_VERSION/solr/server/solr/configsets/data_driven_schema_configs/conf/elevate.xml \
/opt/solr/server/solr/$SOLR_CORE/conf/
# Create Core.properties
RUN mv /opt/solr/server/solr/$SOLR_CORE/conf/solrconfig-$CKAN_VERSION.xml /opt/solr/server/solr/$SOLR_CORE/conf/solrconfig.xml && \
echo name=$SOLR_CORE > /opt/solr/server/solr/$SOLR_CORE/core.properties
# Giving ownership to Solr
USER root
RUN chown -R $SOLR_USER:$SOLR_USER /opt/solr/server/solr/$SOLR_CORE
# User
USER $SOLR_USER:$SOLR_USER

View File

@ -0,0 +1,343 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--solrconfig.xml documentation [https://wiki.apache.org/solr/SolrConfigXml]-->
<config>
<luceneMatchVersion>6.0.0</luceneMatchVersion>
<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-clustering-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/langid/lib/" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/velocity/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-velocity-\d.*\.jar" />
<dataDir>${solr.data.dir:}</dataDir>
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}" />
<codecFactory class="solr.SchemaCodecFactory" />
<indexConfig>
<lockType>${solr.lock.type:native}</lockType>
</indexConfig>
<jmx />
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
<int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
<autoCommit>
<maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>
</updateHandler>
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0" />
<queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" />
<documentCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" />
<cache name="perSegFilter" class="solr.search.LRUCache" size="10" initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" />
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<queryResultWindowSize>20</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries" />
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries" />
</listener>
<useColdSearcher>false</useColdSearcher>
<maxWarmingSearchers>2</maxWarmingSearchers>
</query>
<requestDispatcher handleSelect="false">
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000" formdataUploadLimitInKB="2048" addHttpRequestToContext="false" />
<httpCaching never304="true" />
</requestDispatcher>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler name="/query" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
</lst>
</requestHandler>
<requestHandler name="/browse" class="solr.SearchHandler" useParams="query,facets,velocity,browse">
<lst name="defaults">
<str name="echoParams">explicit</str>
</lst>
</requestHandler>
<initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">_text_</str>
</lst>
</initParams>
<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">add-unknown-fields-to-the-schema</str>
</lst>
</initParams>
<requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.meta">ignored_</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
<requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" />
<requestHandler name="/analysis/document" class="solr.DocumentAnalysisRequestHandler" startup="lazy" />
<requestHandler name="/debug/dump" class="solr.DumpRequestHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">_text_</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">1</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.01</float>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="tvComponent" class="solr.TermVectorComponent" />
<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="tv">true</bool>
</lst>
<arr name="last-components">
<str>tvComponent</str>
</arr>
</requestHandler>
<searchComponent name="terms" class="solr.TermsComponent" />
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
<bool name="distrib">false</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>
<searchComponent name="elevator" class="solr.QueryElevationComponent">
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler>
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<fragmenter name="regex" class="solr.highlight.RegexFragmenter">
<lst name="defaults">
<int name="hl.fragsize">70</int>
<float name="hl.regex.slop">0.5</float>
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
<encoder name="html" class="solr.highlight.HtmlEncoder" />
<fragListBuilder name="simple" class="solr.highlight.SimpleFragListBuilder" />
<fragListBuilder name="single" class="solr.highlight.SingleFragListBuilder" />
<fragListBuilder name="weighted" default="true" class="solr.highlight.WeightedFragListBuilder" />
<fragmentsBuilder name="default" default="true" class="solr.highlight.ScoreOrderFragmentsBuilder" />
<fragmentsBuilder name="colored" class="solr.highlight.ScoreOrderFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[<b style="background:yellow">,<b style="background:lawgreen">,
<b style="background:aquamarine">,<b style="background:magenta">,
<b style="background:palegreen">,<b style="background:coral">,
<b style="background:wheat">,<b style="background:khaki">,
<b style="background:lime">,<b style="background:deepskyblue">]]></str>
<str name="hl.tag.post"><![CDATA[</b>]]></str>
</lst>
</fragmentsBuilder>
<boundaryScanner name="default" default="true" class="solr.highlight.SimpleBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.maxScan">10</str>
<str name="hl.bs.chars">.,!?</str>
</lst>
</boundaryScanner>
<boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.type">WORD</str>
<str name="hl.bs.language">en</str>
<str name="hl.bs.country">US</str>
</lst>
</boundaryScanner>
</highlighting>
</searchComponent>
<schemaFactory class="ClassicIndexSchemaFactory" />
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<processor class="solr.UUIDUpdateProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.DistributedUpdateProcessorFactory" />
<processor class="solr.RemoveBlankFieldUpdateProcessorFactory" />
<processor class="solr.FieldNameMutatingUpdateProcessorFactory">
<str name="pattern">[^\w-\.]</str>
<str name="replacement">_</str>
</processor>
<processor class="solr.ParseBooleanFieldUpdateProcessorFactory" />
<processor class="solr.ParseLongFieldUpdateProcessorFactory" />
<processor class="solr.ParseDoubleFieldUpdateProcessorFactory" />
<processor class="solr.ParseDateFieldUpdateProcessorFactory">
<arr name="format">
<str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
<str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
<str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
<str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
<str>yyyy-MM-dd'T'HH:mm:ssZ</str>
<str>yyyy-MM-dd'T'HH:mm:ss</str>
<str>yyyy-MM-dd'T'HH:mmZ</str>
<str>yyyy-MM-dd'T'HH:mm</str>
<str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
<str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
<str>yyyy-MM-dd HH:mm:ss.SSS</str>
<str>yyyy-MM-dd HH:mm:ss,SSS</str>
<str>yyyy-MM-dd HH:mm:ssZ</str>
<str>yyyy-MM-dd HH:mm:ss</str>
<str>yyyy-MM-dd HH:mmZ</str>
<str>yyyy-MM-dd HH:mm</str>
<str>yyyy-MM-dd</str>
</arr>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<queryResponseWriter name="json" class="solr.JSONResponseWriter">
<str name="content-type">text/plain; charset=UTF-8</str>
</queryResponseWriter>
<queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy">
<str name="template.base.dir">${velocity.template.base.dir:}</str>
<str name="solr.resource.loader.enabled">${velocity.solr.resource.loader.enabled:true}</str>
<str name="params.resource.loader.enabled">${velocity.params.resource.loader.enabled:false}</str>
</queryResponseWriter>
<queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
<int name="xsltCacheLifetimeSeconds">5</int>
</queryResponseWriter>
</config>

View File

@ -0,0 +1,345 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--solrconfig.xml documentation [https://wiki.apache.org/solr/SolrConfigXml]-->
<config>
<luceneMatchVersion>6.0.0</luceneMatchVersion>
<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-clustering-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/langid/lib/" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/velocity/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-velocity-\d.*\.jar" />
<dataDir>${solr.data.dir:}</dataDir>
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}" />
<codecFactory class="solr.SchemaCodecFactory" />
<indexConfig>
<lockType>${solr.lock.type:native}</lockType>
</indexConfig>
<jmx />
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
<int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
<autoCommit>
<maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>
</updateHandler>
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0" />
<queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" />
<documentCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" />
<cache name="perSegFilter" class="solr.search.LRUCache" size="10" initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" />
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<queryResultWindowSize>20</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries" />
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries" />
</listener>
<useColdSearcher>false</useColdSearcher>
<maxWarmingSearchers>2</maxWarmingSearchers>
</query>
<requestDispatcher handleSelect="false">
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000" formdataUploadLimitInKB="2048" addHttpRequestToContext="false" />
<httpCaching never304="true" />
</requestDispatcher>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler name="/query" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
</lst>
</requestHandler>
<requestHandler name="/browse" class="solr.SearchHandler" useParams="query,facets,velocity,browse">
<lst name="defaults">
<str name="echoParams">explicit</str>
</lst>
</requestHandler>
<initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">_text_</str>
</lst>
</initParams>
<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">add-unknown-fields-to-the-schema</str>
</lst>
</initParams>
<requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.meta">ignored_</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
<requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" />
<requestHandler name="/analysis/document" class="solr.DocumentAnalysisRequestHandler" startup="lazy" />
<requestHandler name="/debug/dump" class="solr.DumpRequestHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">_text_</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">1</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">4</int>
<float name="maxQueryFrequency">0.01</float>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="tvComponent" class="solr.TermVectorComponent" />
<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="tv">true</bool>
</lst>
<arr name="last-components">
<str>tvComponent</str>
</arr>
</requestHandler>
<searchComponent name="terms" class="solr.TermsComponent" />
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
<bool name="distrib">false</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>
<searchComponent name="elevator" class="solr.QueryElevationComponent">
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler>
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<fragmenter name="regex" class="solr.highlight.RegexFragmenter">
<lst name="defaults">
<int name="hl.fragsize">70</int>
<float name="hl.regex.slop">0.5</float>
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
<encoder name="html" class="solr.highlight.HtmlEncoder" />
<fragListBuilder name="simple" class="solr.highlight.SimpleFragListBuilder" />
<fragListBuilder name="single" class="solr.highlight.SingleFragListBuilder" />
<fragListBuilder name="weighted" default="true" class="solr.highlight.WeightedFragListBuilder" />
<fragmentsBuilder name="default" default="true" class="solr.highlight.ScoreOrderFragmentsBuilder" />
<fragmentsBuilder name="colored" class="solr.highlight.ScoreOrderFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[<b style="background:yellow">,<b style="background:lawgreen">,
<b style="background:aquamarine">,<b style="background:magenta">,
<b style="background:palegreen">,<b style="background:coral">,
<b style="background:wheat">,<b style="background:khaki">,
<b style="background:lime">,<b style="background:deepskyblue">]]></str>
<str name="hl.tag.post"><![CDATA[</b>]]></str>
</lst>
</fragmentsBuilder>
<boundaryScanner name="default" default="true" class="solr.highlight.SimpleBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.maxScan">10</str>
<str name="hl.bs.chars">.,!?</str>
</lst>
</boundaryScanner>
<boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.type">WORD</str>
<str name="hl.bs.language">en</str>
<str name="hl.bs.country">US</str>
</lst>
</boundaryScanner>
</highlighting>
</searchComponent>
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
</schemaFactory>
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<processor class="solr.UUIDUpdateProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.DistributedUpdateProcessorFactory" />
<processor class="solr.RemoveBlankFieldUpdateProcessorFactory" />
<processor class="solr.FieldNameMutatingUpdateProcessorFactory">
<str name="pattern">[^\w-\.]</str>
<str name="replacement">_</str>
</processor>
<processor class="solr.ParseBooleanFieldUpdateProcessorFactory" />
<processor class="solr.ParseLongFieldUpdateProcessorFactory" />
<processor class="solr.ParseDoubleFieldUpdateProcessorFactory" />
<processor class="solr.ParseDateFieldUpdateProcessorFactory">
<arr name="format">
<str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
<str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
<str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
<str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
<str>yyyy-MM-dd'T'HH:mm:ssZ</str>
<str>yyyy-MM-dd'T'HH:mm:ss</str>
<str>yyyy-MM-dd'T'HH:mmZ</str>
<str>yyyy-MM-dd'T'HH:mm</str>
<str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
<str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
<str>yyyy-MM-dd HH:mm:ss.SSS</str>
<str>yyyy-MM-dd HH:mm:ss,SSS</str>
<str>yyyy-MM-dd HH:mm:ssZ</str>
<str>yyyy-MM-dd HH:mm:ss</str>
<str>yyyy-MM-dd HH:mmZ</str>
<str>yyyy-MM-dd HH:mm</str>
<str>yyyy-MM-dd</str>
</arr>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<queryResponseWriter name="json" class="solr.JSONResponseWriter">
<str name="content-type">text/plain; charset=UTF-8</str>
</queryResponseWriter>
<queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy">
<str name="template.base.dir">${velocity.template.base.dir:}</str>
<str name="solr.resource.loader.enabled">${velocity.solr.resource.loader.enabled:true}</str>
<str name="params.resource.loader.enabled">${velocity.params.resource.loader.enabled:false}</str>
</queryResponseWriter>
<queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
<int name="xsltCacheLifetimeSeconds">5</int>
</queryResponseWriter>
</config>

View File

@ -0,0 +1,29 @@
# Runtime configuration of CKAN enabled through ckanext-envvars
# Information about how it works: https://github.com/okfn/ckanext-envvars
# Note that variables here take presedence over build/up time variables in .env
# General Settings
CKAN_SITE_ID=default
CKAN_SITE_URL=http://localhost:5000
CKAN_PORT=5000
CKAN_MAX_UPLOAD_SIZE_MB=10
# CKAN Plugins
CKAN__PLUGINS=envvars image_view text_view recline_view datastore datapusher harvest ckan_harvester
# CKAN requires storage path to be set in order for filestore to be enabled
CKAN__STORAGE_PATH=/srv/app/data
CKAN__WEBASSETS__PATH=/srv/app/data/webassets
# SYSADMIN settings, a sysadmin user is created automatically with the below credentials
CKAN_SYSADMIN_NAME=sysadmin
CKAN_SYSADMIN_PASSWORD=password
CKAN_SYSADMIN_EMAIL=sysadmin@ckantest.com
# Email settings
CKAN_SMTP_SERVER=smtp.corporateict.domain:25
CKAN_SMTP_STARTTLS=True
CKAN_SMTP_USER=user
CKAN_SMTP_PASSWORD=pass
CKAN_SMTP_MAIL_FROM=ckan@localhost
# Harvest settings
CKAN__HARVEST__MQ__TYPE=redis
CKAN__HARVEST__MQ__HOSTNAME=redis

29
examples/harvest/.env Normal file
View File

@ -0,0 +1,29 @@
# Variables in this file will be used as build arguments when running
# docker-compose build and docker-compose up
# Verify correct substitution with "docker-compose config"
# If variables are newly added or enabled, please delete and rebuild the images to pull in changes:
# docker-compose down -v
# docker-compose build
# docker-compose up -d
# Database
POSTGRES_PASSWORD=ckan
POSTGRES_PORT=5432
DATASTORE_READONLY_PASSWORD=datastore
# CKAN
CKAN_VERSION=2.9.0
CKAN_SITE_ID=default
CKAN_SITE_URL=http://localhost:5000
CKAN_PORT=5000
CKAN_MAX_UPLOAD_SIZE_MB=10
# Datapusher
DATAPUSHER_VERSION=0.0.17
DATAPUSHER_MAX_CONTENT_LENGTH=10485760
DATAPUSHER_CHUNK_SIZE=16384
DATAPUSHER_CHUNK_INSERT_ROWS=250
DATAPUSHER_DOWNLOAD_TIMEOUT=30
# Redis
REDIS_VERSION=6.0.7

View File

@ -0,0 +1,55 @@
###################
### Extensions ####
###################
FROM keitaro/ckan:2.9.0 as extbuild
MAINTAINER Keitaro Inc <info@keitaro.com>
# Locations and tags, please use specific tags or revisions
ENV HARVEST_GIT_URL=https://github.com/ckan/ckanext-harvest
ENV HARVEST_GIT_BRANCH=v1.3.1
# Switch to the root user
USER root
# Install necessary packages to build extensions
RUN apk add --no-cache \
gcc \
g++ \
libffi-dev \
openssl-dev \
python3-dev
# Fetch and build the custom CKAN extensions
RUN pip wheel --wheel-dir=/wheels git+${HARVEST_GIT_URL}@${HARVEST_GIT_BRANCH}#egg=ckanext-harvest
RUN pip wheel --wheel-dir=/wheels -r https://raw.githubusercontent.com/ckan/ckanext-harvest/${HARVEST_GIT_BRANCH}/pip-requirements.txt
RUN curl -o /wheels/harvest.txt https://raw.githubusercontent.com/ckan/ckanext-harvest/${HARVEST_GIT_BRANCH}/pip-requirements.txt
############
### MAIN ###
############
FROM keitaro/ckan:2.9.0
ENV CKAN__PLUGINS envvars image_view text_view recline_view datastore datapusher harvest ckan_harvester
# Switch to the root user
USER root
COPY --from=extbuild /wheels /srv/app/ext_wheels
# Install and enable the custom extensions
RUN pip install --no-index --find-links=/srv/app/ext_wheels ckanext-harvest && \
pip install --no-index --find-links=/srv/app/ext_wheels -r /srv/app/ext_wheels/harvest.txt && \
# Not working atm since ckan config tool tries to load config before executing config-tool, workaround
# ckan -c ${APP_DIR}/production.ini config-tool "ckan.plugins = ${CKAN__PLUGINS}" && \
sed -i "/ckan.plugins = envvars/c ckan.plugins = ${CKAN__PLUGINS}" ${APP_DIR}/production.ini && \
chown -R ckan:ckan /srv/app
# Remove wheels
RUN rm -rf /srv/app/ext_wheels
# Add harvest entrypoint script
COPY ./scripts/00_harvest.sh ${APP_DIR}/docker-entrypoint.d/00_harvest.sh
# Switch to the ckan user
USER ckan

View File

@ -0,0 +1,191 @@
# docker-compose build && docker-compose up -d
version: "3"
volumes:
ckan_data:
pg_data:
solr_data:
services:
ckan:
container_name: ckan
build:
context: .
networks:
- frontend
- backend
depends_on:
- db
ports:
- "0.0.0.0:${CKAN_PORT}:5000"
env_file:
- ./.ckan-env
environment:
- CKAN_SQLALCHEMY_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/ckan
- CKAN_DATASTORE_WRITE_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/datastore
- CKAN_DATASTORE_READ_URL=postgresql://datastore_ro:${DATASTORE_READONLY_PASSWORD}@db/datastore
- CKAN_SOLR_URL=http://solr:8983/solr/ckan
- CKAN_REDIS_URL=redis://redis:6379/1
- CKAN_DATAPUSHER_URL=http://datapusher:8000
- CKAN_SITE_URL=${CKAN_SITE_URL}
- CKAN_MAX_UPLOAD_SIZE_MB=${CKAN_MAX_UPLOAD_SIZE_MB}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
volumes:
- ckan_data:/srv/app/data
ckan-harvest-gather:
container_name: ckan-harvest-gather
build:
context: .
command: ckan -c /srv/app/production.ini harvester gather_consumer
restart: on-failure
networks:
- frontend
- backend
depends_on:
- db
- ckan
env_file:
- ./.ckan-env
environment:
- CKAN_SQLALCHEMY_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/ckan
- CKAN_DATASTORE_WRITE_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/datastore
- CKAN_DATASTORE_READ_URL=postgresql://datastore_ro:${DATASTORE_READONLY_PASSWORD}@db/datastore
- CKAN_SOLR_URL=http://solr:8983/solr/ckan
- CKAN_REDIS_URL=redis://redis:6379/1
- CKAN_DATAPUSHER_URL=http://datapusher:8000
- CKAN_SITE_URL=${CKAN_SITE_URL}
- CKAN_MAX_UPLOAD_SIZE_MB=${CKAN_MAX_UPLOAD_SIZE_MB}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
ckan-harvest-fetch:
container_name: ckan-harvest-fetch
build:
context: .
command: ckan -c /srv/app/production.ini harvester fetch_consumer
restart: on-failure
networks:
- frontend
- backend
depends_on:
- db
- ckan
env_file:
- ./.ckan-env
environment:
- CKAN_SQLALCHEMY_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/ckan
- CKAN_DATASTORE_WRITE_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/datastore
- CKAN_DATASTORE_READ_URL=postgresql://datastore_ro:${DATASTORE_READONLY_PASSWORD}@db/datastore
- CKAN_SOLR_URL=http://solr:8983/solr/ckan
- CKAN_REDIS_URL=redis://redis:6379/1
- CKAN_DATAPUSHER_URL=http://datapusher:8000
- CKAN_SITE_URL=${CKAN_SITE_URL}
- CKAN_MAX_UPLOAD_SIZE_MB=${CKAN_MAX_UPLOAD_SIZE_MB}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
ckan-harvest-run:
container_name: ckan-harvest-run
build:
context: .
command: /bin/sh -c "while true; do sleep 900; ckan -c /srv/app/production.ini harvester run; done"
networks:
- frontend
- backend
depends_on:
- db
- ckan
env_file:
- ./.ckan-env
environment:
- CKAN_SQLALCHEMY_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/ckan
- CKAN_DATASTORE_WRITE_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/datastore
- CKAN_DATASTORE_READ_URL=postgresql://datastore_ro:${DATASTORE_READONLY_PASSWORD}@db/datastore
- CKAN_SOLR_URL=http://solr:8983/solr/ckan
- CKAN_REDIS_URL=redis://redis:6379/1
- CKAN_DATAPUSHER_URL=http://datapusher:8000
- CKAN_SITE_URL=${CKAN_SITE_URL}
- CKAN_MAX_UPLOAD_SIZE_MB=${CKAN_MAX_UPLOAD_SIZE_MB}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
ckan-harvest-cleanlog:
container_name: ckan-harvest-cleanlog
build:
context: .
command: /bin/sh -c "while true; do sleep 86400; ckan -c /srv/app/production.ini harvester clean_harvest_log; done"
networks:
- frontend
- backend
depends_on:
- db
- ckan
env_file:
- ./.ckan-env
environment:
- CKAN_SQLALCHEMY_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/ckan
- CKAN_DATASTORE_WRITE_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/datastore
- CKAN_DATASTORE_READ_URL=postgresql://datastore_ro:${DATASTORE_READONLY_PASSWORD}@db/datastore
- CKAN_SOLR_URL=http://solr:8983/solr/ckan
- CKAN_REDIS_URL=redis://redis:6379/1
- CKAN_DATAPUSHER_URL=http://datapusher:8000
- CKAN_SITE_URL=${CKAN_SITE_URL}
- CKAN_MAX_UPLOAD_SIZE_MB=${CKAN_MAX_UPLOAD_SIZE_MB}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
datapusher:
container_name: datapusher
image: keitaro/ckan-datapusher:${DATAPUSHER_VERSION}
networks:
- frontend
- backend
ports:
- "8000:8000"
environment:
- DATAPUSHER_MAX_CONTENT_LENGTH=${DATAPUSHER_MAX_CONTENT_LENGTH}
- DATAPUSHER_CHUNK_SIZE=${DATAPUSHER_CHUNK_SIZE}
- DATAPUSHER_CHUNK_INSERT_ROWS=${DATAPUSHER_CHUNK_INSERT_ROWS}
- DATAPUSHER_DOWNLOAD_TIMEOUT=${DATAPUSHER_DOWNLOAD_TIMEOUT}
db:
container_name: db
build:
context: ../../compose
dockerfile: postgresql/Dockerfile
args:
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
networks:
- backend
environment:
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- pg_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "ckan"]
solr:
container_name: solr
build:
context: ../../compose
dockerfile: solr/Dockerfile
args:
- CKAN_VERSION=${CKAN_VERSION}
networks:
- backend
volumes:
- solr_data:/opt/solr/server/solr/ckan/data
redis:
container_name: redis
image: redis:${REDIS_VERSION}
networks:
- backend
networks:
frontend:
backend:

View File

@ -0,0 +1,2 @@
#!/bin/sh
ckan -c /srv/app/production.ini harvester initdb

View File

@ -0,0 +1,33 @@
# Runtime configuration of CKAN enabled through ckanext-envvars
# Information about how it works: https://github.com/okfn/ckanext-envvars
# Note that variables here take presedence over build/up time variables in .env
# General Settings
CKAN_SITE_ID=default
CKAN_SITE_URL=http://localhost:5000
CKAN_PORT=5000
CKAN_MAX_UPLOAD_SIZE_MB=10
# CKAN Plugins
CKAN__PLUGINS=envvars s3filestore image_view text_view recline_view datastore datapusher
# CKAN requires storage path to be set in order for filestore to be enabled
CKAN__STORAGE_PATH=/srv/app/data
CKAN__WEBASSETS__PATH=/srv/app/data/webassets
# SYSADMIN settings, a sysadmin user is created automatically with the below credentials
CKAN_SYSADMIN_NAME=sysadmin
CKAN_SYSADMIN_PASSWORD=password
CKAN_SYSADMIN_EMAIL=sysadmin@ckantest.com
# Email settings
CKAN_SMTP_SERVER=smtp.corporateict.domain:25
CKAN_SMTP_STARTTLS=True
CKAN_SMTP_USER=user
CKAN_SMTP_PASSWORD=pass
CKAN_SMTP_MAIL_FROM=ckan@localhost
# S3/MINIO settings
CKANEXT__S3FILESTORE__AWS_ACCESS_KEY_ID = MINIOACCESSKEY
CKANEXT__S3FILESTORE__AWS_SECRET_ACCESS_KEY = MINIOSECRETKEY
CKANEXT__S3FILESTORE__AWS_BUCKET_NAME = ckan
CKANEXT__S3FILESTORE__HOST_NAME = http://minio:9000
CKANEXT__S3FILESTORE__REGION_NAME = us-east-1
CKANEXT__S3FILESTORE__SIGNATURE_VERSION = s3v4

29
examples/s3filestore/.env Normal file
View File

@ -0,0 +1,29 @@
# Variables in this file will be used as build arguments when running
# docker-compose build and docker-compose up
# Verify correct substitution with "docker-compose config"
# If variables are newly added or enabled, please delete and rebuild the images to pull in changes:
# docker-compose down -v
# docker-compose build
# docker-compose up -d
# Database
POSTGRES_PASSWORD=ckan
POSTGRES_PORT=5432
DATASTORE_READONLY_PASSWORD=datastore
# CKAN
CKAN_VERSION=2.8.5
CKAN_SITE_ID=default
CKAN_SITE_URL=http://localhost:5000
CKAN_PORT=5000
CKAN_MAX_UPLOAD_SIZE_MB=10
# Datapusher
DATAPUSHER_VERSION=0.0.17
DATAPUSHER_MAX_CONTENT_LENGTH=10485760
DATAPUSHER_CHUNK_SIZE=16384
DATAPUSHER_CHUNK_INSERT_ROWS=250
DATAPUSHER_DOWNLOAD_TIMEOUT=30
# Redis
REDIS_VERSION=6.0.7

View File

@ -0,0 +1,43 @@
###################
### Extensions ####
###################
FROM keitaro/ckan:2.8.5 as extbuild
MAINTAINER Keitaro Inc <info@keitaro.com>
# Locations and tags, please use specific tags or revisions
ENV S3FILESTORE_GIT_URL=https://github.com/keitaroinc/ckanext-s3filestore
ENV S3FILESTORE_GIT_BRANCH=master
# Switch to the root user
USER root
# Fetch and build the custom CKAN extensions
RUN pip wheel --wheel-dir=/wheels git+${S3FILESTORE_GIT_URL}@${S3FILESTORE_GIT_BRANCH}#egg=ckanext-s3filestore
RUN pip wheel --wheel-dir=/wheels -r https://raw.githubusercontent.com/keitaroinc/ckanext-s3filestore/${S3FILESTORE_GIT_BRANCH}/requirements.txt
RUN curl -o /wheels/s3filestore.txt https://raw.githubusercontent.com/keitaroinc/ckanext-s3filestore/${S3FILESTORE_GIT_BRANCH}/requirements.txt
############
### MAIN ###
############
FROM keitaro/ckan:2.8.5
MAINTAINER Keitaro Inc <info@keitaro.com>
ENV CKAN__PLUGINS envvars s3filestore image_view text_view recline_view datastore datapusher
# Switch to the root user
USER root
COPY --from=extbuild /wheels /srv/app/ext_wheels
# Install and enable the custom extensions
RUN pip install --no-index --find-links=/srv/app/ext_wheels ckanext-s3filestore && \
pip install --no-index --find-links=/srv/app/ext_wheels -r /srv/app/ext_wheels/s3filestore.txt && \
paster --plugin=ckan config-tool ${APP_DIR}/production.ini "ckan.plugins = ${CKAN__PLUGINS}" && \
chown -R ckan:ckan /srv/app
# Remove wheels
RUN rm -rf /srv/app/ext_wheels
USER ckan

View File

@ -0,0 +1,117 @@
# docker-compose build && docker-compose up -d
version: "3"
volumes:
minio_data:
pg_data:
solr_data:
services:
ckan:
container_name: ckan
build:
context: .
networks:
- frontend
- backend
depends_on:
- db
ports:
- "0.0.0.0:${CKAN_PORT}:5000"
env_file:
- ./.ckan-env
environment:
- CKAN_SQLALCHEMY_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/ckan
- CKAN_DATASTORE_WRITE_URL=postgresql://ckan:${POSTGRES_PASSWORD}@db/datastore
- CKAN_DATASTORE_READ_URL=postgresql://datastore_ro:${DATASTORE_READONLY_PASSWORD}@db/datastore
- CKAN_SOLR_URL=http://solr:8983/solr/ckan
- CKAN_REDIS_URL=redis://redis:6379/1
- CKAN_DATAPUSHER_URL=http://datapusher:8000
- CKAN_SITE_URL=${CKAN_SITE_URL}
- CKAN_MAX_UPLOAD_SIZE_MB=${CKAN_MAX_UPLOAD_SIZE_MB}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
datapusher:
container_name: datapusher
image: keitaro/ckan-datapusher:${DATAPUSHER_VERSION}
networks:
- frontend
- backend
ports:
- "8000:8000"
environment:
- DATAPUSHER_MAX_CONTENT_LENGTH=${DATAPUSHER_MAX_CONTENT_LENGTH}
- DATAPUSHER_CHUNK_SIZE=${DATAPUSHER_CHUNK_SIZE}
- DATAPUSHER_CHUNK_INSERT_ROWS=${DATAPUSHER_CHUNK_INSERT_ROWS}
- DATAPUSHER_DOWNLOAD_TIMEOUT=${DATAPUSHER_DOWNLOAD_TIMEOUT}
db:
container_name: db
build:
context: ../../compose
dockerfile: postgresql/Dockerfile
args:
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
networks:
- backend
environment:
- DS_RO_PASS=${DATASTORE_READONLY_PASSWORD}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- pg_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "ckan"]
solr:
container_name: solr
build:
context: ../../compose
dockerfile: solr/Dockerfile
args:
- CKAN_VERSION=${CKAN_VERSION}
networks:
- backend
volumes:
- solr_data:/opt/solr/server/solr/ckan/data
redis:
container_name: redis
image: redis:${REDIS_VERSION}
networks:
- backend
minio:
container_name: minio
image: minio/minio:RELEASE.2020-08-08T04-50-06Z
networks:
- backend
- frontend
ports:
- "0.0.0.0:9000:9000"
environment:
- MINIO_ACCESS_KEY=MINIOACCESSKEY
- MINIO_SECRET_KEY=MINIOSECRETKEY
volumes:
- minio_data:/data
command: server /data
mc:
container_name: mc
image: minio/mc:RELEASE.2020-08-08T02-33-58Z
networks:
- backend
depends_on:
- minio
entrypoint: >
/bin/sh -c "
/usr/bin/mc config host rm local;
/usr/bin/mc config host add --api s3v4 local http://minio:9000 MINIOACCESSKEY MINIOSECRETKEY;
/usr/bin/mc mb local/ckan/;
/usr/bin/mc policy set download local/ckan/storage;
"
networks:
frontend:
backend:

170
images/ckan/2.7/Dockerfile Normal file
View File

@ -0,0 +1,170 @@
##################
### Build CKAN ###
##################
FROM alpine:3.12 as ckanbuild
# Set CKAN version to build
ENV GIT_URL=https://github.com/ckan/ckan.git
ENV GIT_BRANCH=ckan-2.7.8
# Set src dirs
ENV SRC_DIR=/srv/app/src
ENV PIP_SRC=${SRC_DIR}
WORKDIR ${SRC_DIR}
# Packages to build CKAN requirements and plugins
RUN apk add --no-cache \
git \
curl \
python2 \
postgresql-dev \
linux-headers \
gcc \
make \
g++ \
autoconf \
automake \
libtool \
musl-dev \
pcre-dev \
pcre \
python2-dev \
libffi-dev \
libxml2-dev \
libxslt-dev
# Install version 9.x of postgresql-dev so that CKAN requirements can be built
RUN apk add --repository http://dl-cdn.alpinelinux.org/alpine/v3.6/main \
postgresql-dev~=9.6
# Create the src directory
RUN mkdir -p ${SRC_DIR}
# Install pip
RUN curl -o ${SRC_DIR}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
python ${SRC_DIR}/get-pip.py
# Fetch and build CKAN and requirements
RUN pip install -e git+${GIT_URL}@${GIT_BRANCH}#egg=ckan
RUN rm -rf /srv/app/src/ckan/.git
RUN pip wheel --wheel-dir=/wheels -r ckan/requirements.txt
RUN pip wheel --wheel-dir=/wheels uwsgi==2.0.19.1 gevent==20.6.2
###########################
### Default-Extensions ####
###########################
FROM alpine:3.12 as extbuild
# Set src dirs
ENV SRC_DIR=/srv/app/src
ENV PIP_SRC=${SRC_DIR}
# List of default extensions
ENV DEFAULT_EXTENSIONS envvars
# Locations and tags, please use specific tags or revisions
ENV ENVVARS_GIT_URL=https://github.com/okfn/ckanext-envvars
ENV ENVVARS_GIT_BRANCH=0.0.1
RUN apk add --no-cache \
git \
curl \
python2 \
python2-dev
# Create the src directory
RUN mkdir -p ${SRC_DIR}
# Install pip
RUN curl -o ${SRC_DIR}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
python ${SRC_DIR}/get-pip.py
# Fetch and build the default CKAN extensions
RUN pip wheel --wheel-dir=/wheels git+${ENVVARS_GIT_URL}@${ENVVARS_GIT_BRANCH}#egg=ckanext-envvars
############
### MAIN ###
############
FROM alpine:3.12
MAINTAINER Keitaro Inc <info@keitaro.com>
ENV APP_DIR=/srv/app
ENV SRC_DIR=/srv/app/src
ENV DATA_DIR=/srv/app/data
ENV PIP_SRC=${SRC_DIR}
ENV CKAN_SITE_URL=http://localhost:5000
ENV CKAN__PLUGINS envvars image_view text_view recline_view datastore datapusher
WORKDIR ${APP_DIR}
# Install necessary packages to run CKAN
RUN apk add --no-cache git \
bash \
gettext \
curl \
python2 \
libmagic \
pcre \
libxslt \
libxml2 \
tzdata \
apache2-utils && \
# Install version 9.x of postgresql-client so that CKAN can run
apk add --repository http://dl-cdn.alpinelinux.org/alpine/v3.6/main \
postgresql-client~=9.6 && \
# Create SRC_DIR
mkdir -p ${SRC_DIR}
# Install pip
RUN curl -o ${SRC_DIR}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
python ${SRC_DIR}/get-pip.py
# Get artifacts from build stages
COPY --from=ckanbuild /wheels /srv/app/wheels
COPY --from=extbuild /wheels /srv/app/ext_wheels
COPY --from=ckanbuild /srv/app/src/ckan /srv/app/src/ckan
# Additional install steps for build stages artifacts
RUN pip install --no-index --find-links=/srv/app/wheels uwsgi gevent
# Create a local user and group to run the app
RUN addgroup -g 92 -S ckan && \
adduser -u 92 -h /srv/app -H -D -S -G ckan ckan
# Install CKAN
RUN pip install -e /srv/app/src/ckan && \
cd ${SRC_DIR}/ckan && \
cp who.ini ${APP_DIR} && \
pip install --no-index --find-links=/srv/app/wheels -r requirements.txt && \
# Install default CKAN extensions
pip install --no-index --find-links=/srv/app/ext_wheels ckanext-envvars && \
# Create and update CKAN config
# Set timezone
echo "UTC" > /etc/timezone && \
# Generate CKAN config
paster --plugin=ckan make-config ckan ${APP_DIR}/production.ini && \
paster --plugin=ckan config-tool ${APP_DIR}/production.ini "ckan.plugins = ${CKAN__PLUGINS}" && \
# Create the data directory
mkdir ${DATA_DIR} && \
# Change ownership to app user
chown -R ckan:ckan /srv/app
# Remove wheels
RUN rm -rf /srv/app/wheels /srv/app/ext_wheels
# Copy necessary scripts
COPY setup/app ${APP_DIR}
# Create entrypoint directory for children image scripts
ONBUILD RUN mkdir docker-entrypoint.d
EXPOSE 5000
HEALTHCHECK --interval=10s --timeout=5s --retries=5 CMD curl --fail http://localhost:5000/api/3/action/status_show || exit 1
USER ckan
CMD ["/srv/app/start_ckan.sh"]

View File

@ -50,7 +50,6 @@ def check_solr_connection(retry=None):
try: try:
connection = urllib2.urlopen(search_url) connection = urllib2.urlopen(search_url)
except urllib2.URLError as e: except urllib2.URLError as e:
print((str(e)))
print('[prerun] Unable to connect to solr...try again in a while.') print('[prerun] Unable to connect to solr...try again in a while.')
import time import time
time.sleep(10) time.sleep(10)
@ -58,7 +57,6 @@ def check_solr_connection(retry=None):
else: else:
import re import re
conn_info = connection.read() conn_info = connection.read()
conn_info = re.sub(r'"zkConnected":true', '"zkConnected":True', conn_info)
eval(conn_info) eval(conn_info)
def init_db(): def init_db():
@ -75,7 +73,6 @@ def init_db():
print('[prerun] Initializing or upgrading db - end') print('[prerun] Initializing or upgrading db - end')
except subprocess.CalledProcessError as e: except subprocess.CalledProcessError as e:
if 'OperationalError' in e.output: if 'OperationalError' in e.output:
print((e.output))
print('[prerun] Database not ready, waiting a bit before exit...') print('[prerun] Database not ready, waiting a bit before exit...')
import time import time
time.sleep(5) time.sleep(5)
@ -183,4 +180,3 @@ if __name__ == '__main__':
if os.environ.get('CKAN_DATASTORE_WRITE_URL'): if os.environ.get('CKAN_DATASTORE_WRITE_URL'):
init_datastore() init_datastore()
create_sysadmin() create_sysadmin()
#time.sleep(60000) # don't end the prerun script to allow container dock and debug

View File

@ -5,7 +5,7 @@ FROM alpine:3.12 as ckanbuild
# Set CKAN version to build # Set CKAN version to build
ENV GIT_URL=https://github.com/ckan/ckan.git ENV GIT_URL=https://github.com/ckan/ckan.git
ENV GIT_BRANCH=ckan-2.8.4 ENV GIT_BRANCH=ckan-2.8.5
# Set src dirs # Set src dirs
ENV SRC_DIR=/srv/app/src ENV SRC_DIR=/srv/app/src
@ -45,7 +45,7 @@ RUN curl -o ${SRC_DIR}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
RUN pip install -e git+${GIT_URL}@${GIT_BRANCH}#egg=ckan RUN pip install -e git+${GIT_URL}@${GIT_BRANCH}#egg=ckan
RUN rm -rf /srv/app/src/ckan/.git RUN rm -rf /srv/app/src/ckan/.git
RUN pip wheel --wheel-dir=/wheels -r ckan/requirements.txt RUN pip wheel --wheel-dir=/wheels -r ckan/requirements.txt
RUN pip wheel --wheel-dir=/wheels uwsgi gevent RUN pip wheel --wheel-dir=/wheels uwsgi==2.0.19.1 gevent==20.6.2
########################### ###########################
@ -58,13 +58,11 @@ ENV SRC_DIR=/srv/app/src
ENV PIP_SRC=${SRC_DIR} ENV PIP_SRC=${SRC_DIR}
# List of default extensions # List of default extensions
ENV DEFAULT_EXTENSIONS envvars s3filestore ENV DEFAULT_EXTENSIONS envvars
# Locations and tags, please use specific tags or revisions # Locations and tags, please use specific tags or revisions
ENV ENVVARS_GIT_URL=https://github.com/okfn/ckanext-envvars ENV ENVVARS_GIT_URL=https://github.com/okfn/ckanext-envvars
ENV ENVVARS_GIT_BRANCH=0.0.1 ENV ENVVARS_GIT_BRANCH=0.0.1
ENV S3FILESTORE_GIT_URL=https://github.com/keitaroinc/ckanext-s3filestore
ENV S3FILESTORE_GIT_BRANCH=efd5711
RUN apk add --no-cache \ RUN apk add --no-cache \
git \ git \
@ -81,9 +79,6 @@ RUN curl -o ${SRC_DIR}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
# Fetch and build the default CKAN extensions # Fetch and build the default CKAN extensions
RUN pip wheel --wheel-dir=/wheels git+${ENVVARS_GIT_URL}@${ENVVARS_GIT_BRANCH}#egg=ckanext-envvars RUN pip wheel --wheel-dir=/wheels git+${ENVVARS_GIT_URL}@${ENVVARS_GIT_BRANCH}#egg=ckanext-envvars
RUN pip wheel --wheel-dir=/wheels git+${S3FILESTORE_GIT_URL}@${S3FILESTORE_GIT_BRANCH}#egg=ckanext-s3filestore
RUN pip wheel --wheel-dir=/wheels -r https://raw.githubusercontent.com/keitaroinc/ckanext-s3filestore/${S3FILESTORE_GIT_BRANCH}/requirements.txt
RUN curl -o /wheels/s3filestore.txt https://raw.githubusercontent.com/keitaroinc/ckanext-s3filestore/${S3FILESTORE_GIT_BRANCH}/requirements.txt
############ ############
### MAIN ### ### MAIN ###
@ -94,9 +89,10 @@ MAINTAINER Keitaro Inc <info@keitaro.com>
ENV APP_DIR=/srv/app ENV APP_DIR=/srv/app
ENV SRC_DIR=/srv/app/src ENV SRC_DIR=/srv/app/src
ENV DATA_DIR=/srv/app/data
ENV PIP_SRC=${SRC_DIR} ENV PIP_SRC=${SRC_DIR}
ENV CKAN_SITE_URL=http://localhost:5000 ENV CKAN_SITE_URL=http://localhost:5000
ENV CKAN__PLUGINS envvars s3filestore image_view text_view recline_view datastore datapusher ENV CKAN__PLUGINS envvars image_view text_view recline_view datastore datapusher
WORKDIR ${APP_DIR} WORKDIR ${APP_DIR}
@ -138,14 +134,15 @@ RUN pip install -e /srv/app/src/ckan && \
cp who.ini ${APP_DIR} && \ cp who.ini ${APP_DIR} && \
pip install --no-index --find-links=/srv/app/wheels -r requirements.txt && \ pip install --no-index --find-links=/srv/app/wheels -r requirements.txt && \
# Install default CKAN extensions # Install default CKAN extensions
pip install --no-index --find-links=/srv/app/ext_wheels ckanext-envvars ckanext-s3filestore && \ pip install --no-index --find-links=/srv/app/ext_wheels ckanext-envvars && \
pip install --no-index --find-links=/srv/app/ext_wheels -r /srv/app/ext_wheels/s3filestore.txt && \
# Create and update CKAN config # Create and update CKAN config
# Set timezone # Set timezone
echo "UTC" > /etc/timezone && \ echo "UTC" > /etc/timezone && \
# Generate CKAN config # Generate CKAN config
paster --plugin=ckan make-config ckan ${APP_DIR}/production.ini && \ paster --plugin=ckan make-config ckan ${APP_DIR}/production.ini && \
paster --plugin=ckan config-tool ${APP_DIR}/production.ini "ckan.plugins = ${CKAN__PLUGINS}" && \ paster --plugin=ckan config-tool ${APP_DIR}/production.ini "ckan.plugins = ${CKAN__PLUGINS}" && \
# Create the data directory
mkdir ${DATA_DIR} && \
# Change ownership to app user # Change ownership to app user
chown -R ckan:ckan /srv/app chown -R ckan:ckan /srv/app

View File

@ -0,0 +1,4 @@
#!/bin/bash
# this is called before uwsgi is executed
# uset his to add extra scripts before ckan is started

View File

@ -0,0 +1,182 @@
import os
import sys
import subprocess
import psycopg2
import urllib2
import re
import time
ckan_ini = os.environ.get('CKAN_INI', '/srv/app/production.ini')
RETRY = 5
def check_db_connection(retry=None):
print('[prerun] Start check_db_connection...')
if retry is None:
retry = RETRY
elif retry == 0:
print('[prerun] Giving up after 5 tries...')
sys.exit(1)
conn_str = os.environ.get('CKAN_SQLALCHEMY_URL', '')
try:
connection = psycopg2.connect(conn_str)
except psycopg2.Error as e:
print((str(e)))
print('[prerun] Unable to connect to the database...try again in a while.')
import time
time.sleep(10)
check_db_connection(retry = retry - 1)
else:
connection.close()
def check_solr_connection(retry=None):
print('[prerun] Start check_solr_connection...')
if retry is None:
retry = RETRY
elif retry == 0:
print('[prerun] Giving up after 5 tries...')
sys.exit(1)
url = os.environ.get('CKAN_SOLR_URL', '')
search_url = '{url}/select/?q=*&wt=json'.format(url=url)
try:
connection = urllib2.urlopen(search_url)
except urllib2.URLError as e:
print('[prerun] Unable to connect to solr...try again in a while.')
import time
time.sleep(10)
check_solr_connection(retry = retry - 1)
else:
import re
conn_info = connection.read()
eval(conn_info)
def init_db():
print('[prerun] Start init_db...')
db_command = ['paster', '--plugin=ckan', 'db', 'init', '-c', ckan_ini]
print('[prerun] Initializing or upgrading db - start using paster db init')
try:
# run init scripts
subprocess.check_output(db_command, stderr=subprocess.STDOUT)
print('[prerun] Initializing or upgrading db - end')
except subprocess.CalledProcessError as e:
if 'OperationalError' in e.output:
print('[prerun] Database not ready, waiting a bit before exit...')
import time
time.sleep(5)
sys.exit(1)
else:
print((e.output))
raise e
print('[prerun] Initializing or upgrading db - finish')
def init_datastore():
conn_str = os.environ.get('CKAN_DATASTORE_WRITE_URL')
if not conn_str:
print('[prerun] Skipping datastore initialization')
return
datastore_perms_command = ['paster', '--plugin=ckan', 'datastore',
'set-permissions', '-c', ckan_ini]
connection = psycopg2.connect(conn_str)
cursor = connection.cursor()
print('[prerun] Initializing datastore db - start')
try:
datastore_perms = subprocess.Popen(
datastore_perms_command,
stdout=subprocess.PIPE)
perms_sql = datastore_perms.stdout.read()
# Remove internal pg command as psycopg2 does not like it
perms_sql = re.sub('\\\\connect \"(.*)\"', '', perms_sql.decode('utf-8'))
cursor.execute(perms_sql)
for notice in connection.notices:
print(notice)
connection.commit()
print('[prerun] Initializing datastore db - end')
print((datastore_perms.stdout.read()))
except psycopg2.Error as e:
print('[prerun] Could not initialize datastore')
print((str(e)))
except subprocess.CalledProcessError as e:
if 'OperationalError' in e.output:
print((e.output))
print('[prerun] Database not ready, waiting a bit before exit...')
time.sleep(5)
sys.exit(1)
else:
print((e.output))
raise e
finally:
cursor.close()
connection.close()
def create_sysadmin():
print('[prerun] Start create_sysadmin...')
name = os.environ.get('CKAN_SYSADMIN_NAME')
password = os.environ.get('CKAN_SYSADMIN_PASSWORD')
email = os.environ.get('CKAN_SYSADMIN_EMAIL')
if name and password and email:
# Check if user exists
command = ['paster', '--plugin=ckan', 'user', name, '-c', ckan_ini]
out = subprocess.check_output(command)
if 'User:None' not in re.sub(r'\s', '', out.decode('utf-8')):
print('[prerun] Sysadmin user exists, skipping creation')
return
# Create user
command = ['paster', '--plugin=ckan', 'user', 'add',
name,
'password=' + password,
'email=' + email,
'-c', ckan_ini]
subprocess.call(command)
print(('[prerun] Created user {0}'.format(name)))
# Make it sysadmin
command = ['paster', '--plugin=ckan', 'sysadmin', 'add',
name,
'-c', ckan_ini]
subprocess.call(command)
print(('[prerun] Made user {0} a sysadmin'.format(name)))
if __name__ == '__main__':
maintenance = os.environ.get('MAINTENANCE_MODE', '').lower() == 'true'
if maintenance:
print('[prerun] Maintenance mode, skipping setup...')
else:
check_db_connection()
check_solr_connection()
init_db()
if os.environ.get('CKAN_DATASTORE_WRITE_URL'):
init_datastore()
create_sysadmin()

View File

@ -0,0 +1,42 @@
#!/bin/bash
# Run any startup scripts provided by images extending this one
if [[ -d "${APP_DIR}/docker-entrypoint.d" ]]
then
for f in ${APP_DIR}/docker-entrypoint.d/*; do
case "$f" in
*.sh) echo "$0: Running init file $f"; . "$f" ;;
*.py) echo "$0: Running init file $f"; python "$f"; echo ;;
*) echo "$0: Ignoring $f (not an sh or py file)" ;;
esac
echo
done
fi
# Set the common uwsgi options
UWSGI_OPTS="--socket /tmp/uwsgi.sock --uid 92 --gid 92 --http :5000 --master --enable-threads --paste config:/srv/app/production.ini --paste-logger /srv/app/production.ini --lazy-apps --gevent 2000 -p 2 -L --gevent-early-monkey-patch"
# Run the prerun script to init CKAN and create the default admin user
python prerun.py
# Check whether http basic auth password protection is enabled and enable basicauth routing on uwsgi respecfully
if [ $? -eq 0 ]
then
if [ "$PASSWORD_PROTECT" = true ]
then
if [ "$HTPASSWD_USER" ] || [ "$HTPASSWD_PASSWORD" ]
then
# Generate htpasswd file for basicauth
htpasswd -d -b -c /srv/app/.htpasswd $HTPASSWD_USER $HTPASSWD_PASSWORD
# Start uwsgi with basicauth
uwsgi --ini /srv/app/uwsgi.conf --pcre-jit $UWSGI_OPTS
else
echo "Missing HTPASSWD_USER or HTPASSWD_PASSWORD environment variables. Exiting..."
exit 1
fi
else
# Start uwsgi
uwsgi $UWSGI_OPTS
fi
else
echo "[prerun] failed...not starting CKAN."
fi

View File

@ -0,0 +1,2 @@
[uwsgi]
route = ^(?!/api).*$ basicauth:Restricted,/srv/app/.htpasswd

179
images/ckan/2.9/Dockerfile Normal file
View File

@ -0,0 +1,179 @@
##################
### Build CKAN ###
##################
FROM alpine:3.12 as ckanbuild
# Set CKAN version to build
ENV GIT_URL=https://github.com/ckan/ckan.git
ENV GIT_BRANCH=ckan-2.9.0
# Set src dirs
ENV SRC_DIR=/srv/app/src
ENV PIP_SRC=${SRC_DIR}
WORKDIR ${SRC_DIR}
# Packages to build CKAN requirements and plugins
RUN apk add --no-cache \
git \
curl \
python3 \
postgresql-dev \
linux-headers \
gcc \
make \
g++ \
autoconf \
automake \
libtool \
musl-dev \
pcre-dev \
pcre \
python3-dev \
libffi-dev \
libxml2-dev \
libxslt-dev
# Link python to python3
RUN ln -s /usr/bin/python3 /usr/bin/python
# Create the src directory
RUN mkdir -p ${SRC_DIR}
# Install pip
RUN curl -o ${SRC_DIR}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
python ${SRC_DIR}/get-pip.py
# Downgrade setuptools so that CKAN requirements can be built
RUN pip install setuptools==44.1.0
# Fetch and build CKAN and requirements
RUN pip install -e git+${GIT_URL}@${GIT_BRANCH}#egg=ckan
RUN rm -rf /srv/app/src/ckan/.git
RUN pip wheel --wheel-dir=/wheels -r ckan/requirements.txt
RUN pip wheel --wheel-dir=/wheels uwsgi==2.0.19.1 gevent==20.6.2
###########################
### Default-Extensions ####
###########################
FROM alpine:3.12 as extbuild
# Set src dirs
ENV SRC_DIR=/srv/app/src
ENV PIP_SRC=${SRC_DIR}
# List of default extensions
ENV DEFAULT_EXTENSIONS envvars
# Locations and tags, please use specific tags or revisions
ENV ENVVARS_GIT_URL=https://github.com/okfn/ckanext-envvars
ENV ENVVARS_GIT_BRANCH=0.0.1
RUN apk add --no-cache \
git \
curl \
python3 \
python3-dev
# Link python to python3
RUN ln -s /usr/bin/python3 /usr/bin/python
# Create the src directory
RUN mkdir -p ${SRC_DIR}
# Install pip
RUN curl -o ${SRC_DIR}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
python ${SRC_DIR}/get-pip.py
# Fetch and build the default CKAN extensions
RUN pip wheel --wheel-dir=/wheels git+${ENVVARS_GIT_URL}@${ENVVARS_GIT_BRANCH}#egg=ckanext-envvars
############
### MAIN ###
############
FROM alpine:3.12
MAINTAINER Keitaro Inc <info@keitaro.com>
ENV APP_DIR=/srv/app
ENV SRC_DIR=/srv/app/src
ENV DATA_DIR=/srv/app/data
ENV PIP_SRC=${SRC_DIR}
ENV CKAN_SITE_URL=http://localhost:5000
ENV CKAN__PLUGINS envvars image_view text_view recline_view datastore datapusher
WORKDIR ${APP_DIR}
# Install necessary packages to run CKAN
RUN apk add --no-cache git \
bash \
gettext \
curl \
postgresql-client \
python3 \
libmagic \
pcre \
libxslt \
libxml2 \
tzdata \
apache2-utils && \
# Create SRC_DIR
mkdir -p ${SRC_DIR} && \
# Link python to python3
ln -s /usr/bin/python3 /usr/bin/python
# Install pip
RUN curl -o ${SRC_DIR}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
python ${SRC_DIR}/get-pip.py
# Get artifacts from build stages
COPY --from=ckanbuild /wheels /srv/app/wheels
COPY --from=extbuild /wheels /srv/app/ext_wheels
COPY --from=ckanbuild /srv/app/src/ckan /srv/app/src/ckan
# Additional install steps for build stages artifacts
RUN pip install --no-index --find-links=/srv/app/wheels uwsgi gevent
# Create a local user and group to run the app
RUN addgroup -g 92 -S ckan && \
adduser -u 92 -h /srv/app -H -D -S -G ckan ckan
# Install CKAN
RUN pip install -e /srv/app/src/ckan && \
cd ${SRC_DIR}/ckan && \
cp who.ini ${APP_DIR} && \
pip install --no-index --find-links=/srv/app/wheels -r requirements.txt && \
# Install default CKAN extensions
pip install --no-index --find-links=/srv/app/ext_wheels ckanext-envvars && \
# Create and update CKAN config
# Set timezone
echo "UTC" > /etc/timezone && \
# Generate CKAN config
ckan generate config ${APP_DIR}/production.ini && \
# Not working atm since ckan config tool tries to load config before executing config-tool, workaround
# ckan -c ${APP_DIR}/production.ini config-tool "ckan.plugins = ${CKAN__PLUGINS}" && \
sed -i "/ckan.plugins = stats/c ckan.plugins = ${CKAN__PLUGINS}" ${APP_DIR}/production.ini && \
# Create the data directory
mkdir ${DATA_DIR} && \
# Webassets can't be loaded from env variables at runtime, it needs to be in the config so that it is created
sed -i "/ckan.webassets.path = /c ckan.webassets.path = ${DATA_DIR}/webassets" ${APP_DIR}/production.ini && \
# Change ownership to app user
chown -R ckan:ckan /srv/app
# Remove wheels
RUN rm -rf /srv/app/wheels /srv/app/ext_wheels
# Copy necessary scripts
COPY setup/app ${APP_DIR}
# Create entrypoint directory for children image scripts
ONBUILD RUN mkdir docker-entrypoint.d
EXPOSE 5000
HEALTHCHECK --interval=10s --timeout=5s --retries=5 CMD curl --fail http://localhost:5000/api/3/action/status_show || exit 1
USER ckan
CMD ["/srv/app/start_ckan.sh"]

View File

@ -0,0 +1,4 @@
#!/bin/bash
# this is called before uwsgi is executed
# uset his to add extra scripts before ckan is started

View File

@ -0,0 +1,182 @@
import os
import sys
import subprocess
import psycopg2
import urllib.request, urllib.error, urllib.parse
import re
import time
ckan_ini = os.environ.get('CKAN_INI', '/srv/app/production.ini')
RETRY = 5
def check_db_connection(retry=None):
print('[prerun] Start check_db_connection...')
if retry is None:
retry = RETRY
elif retry == 0:
print('[prerun] Giving up after 5 tries...')
sys.exit(1)
conn_str = os.environ.get('CKAN_SQLALCHEMY_URL', '')
try:
connection = psycopg2.connect(conn_str)
except psycopg2.Error as e:
print((str(e)))
print('[prerun] Unable to connect to the database...try again in a while.')
import time
time.sleep(10)
check_db_connection(retry = retry - 1)
else:
connection.close()
def check_solr_connection(retry=None):
print('[prerun] Start check_solr_connection...')
if retry is None:
retry = RETRY
elif retry == 0:
print('[prerun] Giving up after 5 tries...')
sys.exit(1)
url = os.environ.get('CKAN_SOLR_URL', '')
search_url = '{url}/select/?q=*&wt=json'.format(url=url)
try:
connection = urllib.request.urlopen(search_url)
except urllib.error.URLError as e:
print('[prerun] Unable to connect to solr...try again in a while.')
import time
time.sleep(10)
check_solr_connection(retry = retry - 1)
else:
import re
conn_info = connection.read()
print(conn_info)
eval(conn_info)
def init_db():
print('[prerun] Start init_db...')
db_command = ['ckan', '-c', ckan_ini, 'db', 'init']
print('[prerun] Initializing or upgrading db - start using ckan db init')
try:
# run init scripts
subprocess.check_output(db_command, stderr=subprocess.STDOUT)
print('[prerun] Initializing or upgrading db - end')
except subprocess.CalledProcessError as e:
if 'OperationalError' in str(e.output):
print(e.output.decode('utf-8'))
print('[prerun] Database not ready, waiting a bit before exit...')
import time
time.sleep(5)
sys.exit(1)
else:
print(e.output.decode('utf-8'))
raise e
print('[prerun] Initializing or upgrading db - finish')
def init_datastore():
conn_str = os.environ.get('CKAN_DATASTORE_WRITE_URL')
if not conn_str:
print('[prerun] Skipping datastore initialization')
return
datastore_perms_command = ['ckan', '-c', ckan_ini, 'datastore',
'set-permissions']
connection = psycopg2.connect(conn_str)
cursor = connection.cursor()
print('[prerun] Initializing datastore db - start')
try:
datastore_perms = subprocess.Popen(
datastore_perms_command,
stdout=subprocess.PIPE)
perms_sql = datastore_perms.stdout.read()
# Remove internal pg command as psycopg2 does not like it
perms_sql = re.sub('\\\\connect \"(.*)\"', '', perms_sql.decode('utf-8'))
cursor.execute(perms_sql)
for notice in connection.notices:
print(notice)
connection.commit()
print('[prerun] Initializing datastore db - end')
print((datastore_perms.stdout.read()))
except psycopg2.Error as e:
print('[prerun] Could not initialize datastore')
print(e.decode('utf-8'))
except subprocess.CalledProcessError as e:
if 'OperationalError' in str(e.output):
print(e.output.decode('utf-8'))
print('[prerun] Database not ready, waiting a bit before exit...')
time.sleep(5)
sys.exit(1)
else:
print(e.output.decode('utf-8'))
raise e
finally:
cursor.close()
connection.close()
def create_sysadmin():
print('[prerun] Start create_sysadmin...')
name = os.environ.get('CKAN_SYSADMIN_NAME')
password = os.environ.get('CKAN_SYSADMIN_PASSWORD')
email = os.environ.get('CKAN_SYSADMIN_EMAIL')
if name and password and email:
# Check if user exists
command = ['ckan', '-c', ckan_ini, 'user', 'show', name]
out = subprocess.check_output(command)
if 'User:None' not in re.sub(r'\s', '', out.decode('utf-8')):
print('[prerun] Sysadmin user exists, skipping creation')
return
# Create user
command = ['ckan', '-c', ckan_ini, 'user', 'add',
name,
'password=' + password,
'email=' + email]
subprocess.call(command)
print(('[prerun] Created user {0}'.format(name)))
# Make it sysadmin
command = ['ckan', '-c', ckan_ini, 'sysadmin', 'add',
name]
subprocess.call(command)
print(('[prerun] Made user {0} a sysadmin'.format(name)))
if __name__ == '__main__':
maintenance = os.environ.get('MAINTENANCE_MODE', '').lower() == 'true'
if maintenance:
print('[prerun] Maintenance mode, skipping setup...')
else:
check_db_connection()
check_solr_connection()
init_db()
if os.environ.get('CKAN_DATASTORE_WRITE_URL'):
init_datastore()
create_sysadmin()

View File

@ -0,0 +1,42 @@
#!/bin/bash
# Run any startup scripts provided by images extending this one
if [[ -d "${APP_DIR}/docker-entrypoint.d" ]]
then
for f in ${APP_DIR}/docker-entrypoint.d/*; do
case "$f" in
*.sh) echo "$0: Running init file $f"; . "$f" ;;
*.py) echo "$0: Running init file $f"; python "$f"; echo ;;
*) echo "$0: Ignoring $f (not an sh or py file)" ;;
esac
echo
done
fi
# Set the common uwsgi options
UWSGI_OPTS="--socket /tmp/uwsgi.sock --uid ckan --gid ckan --http :5000 --master --enable-threads --wsgi-file /srv/app/wsgi.py --module wsgi:application --lazy-apps --gevent 2000 -p 2 -L --gevent-early-monkey-patch --vacuum --harakiri 50 --callable application"
# Run the prerun script to init CKAN and create the default admin user
python prerun.py
# Check whether http basic auth password protection is enabled and enable basicauth routing on uwsgi respecfully
if [ $? -eq 0 ]
then
if [ "$PASSWORD_PROTECT" = true ]
then
if [ "$HTPASSWD_USER" ] || [ "$HTPASSWD_PASSWORD" ]
then
# Generate htpasswd file for basicauth
htpasswd -d -b -c /srv/app/.htpasswd $HTPASSWD_USER $HTPASSWD_PASSWORD
# Start uwsgi with basicauth
uwsgi --ini /srv/app/uwsgi.conf --pcre-jit $UWSGI_OPTS
else
echo "Missing HTPASSWD_USER or HTPASSWD_PASSWORD environment variables. Exiting..."
exit 1
fi
else
# Start uwsgi
uwsgi $UWSGI_OPTS
fi
else
echo "[prerun] failed...not starting CKAN."
fi

View File

@ -0,0 +1,2 @@
[uwsgi]
route = ^(?!/api).*$ basicauth:Restricted,/srv/app/.htpasswd

View File

@ -0,0 +1,12 @@
# -*- coding: utf-8 -*-
import os
from ckan.config.middleware import make_app
from ckan.cli import CKANConfigLoader
from logging.config import fileConfig as loggingFileConfig
config_filepath = os.path.join(
os.path.dirname(os.path.abspath(__file__)), u'production.ini')
abspath = os.path.join(os.path.dirname(os.path.abspath(__file__)))
loggingFileConfig(config_filepath)
config = CKANConfigLoader(config_filepath).get_config()
application = make_app(config)

View File

@ -0,0 +1,100 @@
#############
### Build ###
#############
FROM alpine:3.12 as build
# Set datapusher version to build
ENV GIT_URL https://github.com/keitaroinc/datapusher.git
ENV GIT_BRANCH parametrize-job-config
ENV REQUIREMENTS_URL https://raw.githubusercontent.com/keitaroinc/datapusher/${GIT_BRANCH}/requirements.txt
# Set src dirs
ENV SRC_DIR=/srv/app/src
ENV PIP_SRC=${SRC_DIR}
WORKDIR ${SRC_DIR}
# Packages to build datapusher
RUN apk add --no-cache \
python3 \
curl \
gcc \
make \
g++ \
autoconf \
automake \
libtool \
git \
musl-dev \
python3-dev \
libffi-dev \
openssl-dev \
libxml2-dev \
libxslt-dev
# Create the src directory
RUN mkdir -p ${SRC_DIR}
# Install pip
RUN curl -o ${SRC_DIR}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
python3 ${SRC_DIR}/get-pip.py
# Fetch and build datapusher and requirements
RUN pip wheel --wheel-dir=/wheels git+${GIT_URL}@${GIT_BRANCH}#egg=datapusher
RUN pip wheel --wheel-dir=/wheels -r ${REQUIREMENTS_URL}
RUN curl -o /wheels/requirements.txt ${REQUIREMENTS_URL}
# Get uwsgi and gevent from pip
RUN pip wheel --wheel-dir=/wheels uwsgi==2.0.19.1 gevent==20.6.2
############
### MAIN ###
############
FROM alpine:3.12
MAINTAINER Keitaro Inc <info@keitaro.com>
ENV APP_DIR=/srv/app
ENV JOB_CONFIG ${APP_DIR}/datapusher_settings.py
WORKDIR ${APP_DIR}
RUN apk add --no-cache \
python3 \
curl \
libmagic \
libxslt
# Install pip
RUN curl -o ${src_dir}/get-pip.py https://bootstrap.pypa.io/get-pip.py && \
python3 ${src_dir}/get-pip.py
# Get artifacts from build stages
COPY --from=build /wheels /srv/app/wheels
# Install uwsgi and gevent
RUN pip install --no-index --find-links=/srv/app/wheels uwsgi gevent
# Create a local user and group to run the app
RUN addgroup -g 92 -S ckan && \
adduser -u 92 -h /srv/app -H -D -S -G ckan ckan
# Install datapusher
RUN pip install --no-index --find-links=/srv/app/wheels datapusher && \
pip install --no-index --find-links=/srv/app/wheels -r /srv/app/wheels/requirements.txt && \
# Set timezone
echo "UTC" > /etc/timezone && \
# Change ownership to app user
chown -R ckan:ckan /srv/app
# Remove wheels
RUN rm -rf /srv/app/wheels
COPY setup ${APP_DIR}
EXPOSE 8000
USER ckan
CMD ["uwsgi", "--socket=/tmp/uwsgi.sock", "--uid=92", "--gid=92", "--http=:8000", "--master", "--enable-threads", "--gevent=2000", "-p 2", "-L", "--wsgi-file=wsgi.py"]

View File

@ -0,0 +1,33 @@
import uuid
import os
DEBUG = False
TESTING = False
SECRET_KEY = str(uuid.uuid4())
USERNAME = str(uuid.uuid4())
PASSWORD = str(uuid.uuid4())
NAME = 'datapusher'
# database
SQLALCHEMY_DATABASE_URI = 'sqlite:////tmp/job_store.db'
# webserver host and port
HOST = '0.0.0.0'
PORT = 8000
# logging
#FROM_EMAIL = 'server-error@example.com'
#ADMINS = ['yourname@example.com'] # where to send emails
#LOG_FILE = '/tmp/ckan_service.log'
STDERR = True
# Content length settings
MAX_CONTENT_LENGTH = int(os.environ.get('DATAPUSHER_MAX_CONTENT_LENGTH', '1024000'))
CHUNK_SIZE = int(os.environ.get('DATAPUSHER_CHUNK_SIZE', '16384'))
CHUNK_INSERT_ROWS = int(os.environ.get('DATAPUSHER_CHUNK_INSERT_ROWS', '250'))
DOWNLOAD_TIMEOUT = int(os.environ.get('DATAPUSHER_DOWNLOAD_TIMEOUT', '30'))

View File

@ -0,0 +1,9 @@
import os
import sys
import ckanserviceprovider.web as web
web.init()
import datapusher.jobs as jobs
application = web.app