[#15] Reenable the Solr backend
It can be used against CKAN core 2.0.1 Conflicts: ckanext/spatial/plugin.py
This commit is contained in:
parent
e03db4901c
commit
9976b6e96f
130
README.rst
130
README.rst
|
@ -55,14 +55,101 @@ To enable the spatial query you need to add the ``spatial_query`` plugin to your
|
||||||
ini file (See `Configuration`_). This plugin requires the ``spatial_metadata``
|
ini file (See `Configuration`_). This plugin requires the ``spatial_metadata``
|
||||||
plugin.
|
plugin.
|
||||||
|
|
||||||
|
There are different backends supported for the spatial search, it is important
|
||||||
|
to understand their differences and the necessary setup required when choosing
|
||||||
|
which one to use. The backend to use is defined with the configuration option
|
||||||
|
``ckanext.spatial.search_backend``, eg::
|
||||||
|
|
||||||
|
ckanext.spatial.search_backend = solr
|
||||||
|
|
||||||
|
The following table summarizes the different spatial search backends:
|
||||||
|
|
||||||
|
+------------------------+---------------+-------------------------------------+-----------------------------------------------------------+-------------------------------------------+
|
||||||
|
| Backend | Solr Versions | Supported geometries | Sorting and relevance | Performance with large number of datasets |
|
||||||
|
+========================+===============+=====================================+===========================================================+===========================================+
|
||||||
|
| ``solr`` | 3.1 to 4.x | Bounding Box | Yes, spatial sorting combined with other query parameters | Good |
|
||||||
|
+------------------------+---------------+-------------------------------------+-----------------------------------------------------------+-------------------------------------------+
|
||||||
|
| ``solr-spatial-field`` | 4.x | Bounding Box, Point and Polygon (1) | Not implemented | Good |
|
||||||
|
+------------------------+---------------+-------------------------------------+-----------------------------------------------------------+-------------------------------------------+
|
||||||
|
| ``postgis`` | 1.3 to 4.x | Bounding Box | Partial, only spatial sorting supported (2) | Poor |
|
||||||
|
+------------------------+---------------+-------------------------------------+-----------------------------------------------------------+-------------------------------------------+
|
||||||
|
|
||||||
|
(1) Requires JTS
|
||||||
|
(2) Needs ``ckanext.spatial.use_postgis_sorting`` set to True
|
||||||
|
|
||||||
|
|
||||||
|
We recommend to use the ``solr`` backend whenever possible. Here are more
|
||||||
|
details about the available options:
|
||||||
|
|
||||||
|
* ``solr`` (Recommended)
|
||||||
|
This option uses normal Solr fields to index the relevant bits of
|
||||||
|
information about the geometry and uses an algorithm function to
|
||||||
|
sort results by relevance, keeping any other non-spatial filtering. It only
|
||||||
|
supports bounding boxes both for the geometries to be indexed and the input
|
||||||
|
query shape. It requires `EDisMax`_ query parser, so it will only work on
|
||||||
|
versions of Solr greater than 3.1 (We recommend using Solr 4.x).
|
||||||
|
|
||||||
|
You will need to add the following fields to your Solr schema file to enable it::
|
||||||
|
|
||||||
|
<fields>
|
||||||
|
<!-- ... -->
|
||||||
|
<field name="bbox_area" type="float" indexed="true" stored="true" />
|
||||||
|
<field name="maxx" type="float" indexed="true" stored="true" />
|
||||||
|
<field name="maxy" type="float" indexed="true" stored="true" />
|
||||||
|
<field name="minx" type="float" indexed="true" stored="true" />
|
||||||
|
<field name="miny" type="float" indexed="true" stored="true" />
|
||||||
|
</fields>
|
||||||
|
|
||||||
|
|
||||||
|
* ``solr-spatial-field``
|
||||||
|
This option uses the `spatial field <http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4>`_
|
||||||
|
introduced in Solr 4, which allows to index points, rectangles and more
|
||||||
|
complex geometries (complex geometries will require `JTS`_, check the
|
||||||
|
documentation). Sorting has not yet been implemented, users willing to do so
|
||||||
|
will need to modify the query using the ``before_search`` extension point.
|
||||||
|
|
||||||
|
You will need to add the following field type and field to your Solr schema
|
||||||
|
file to enable it (Check the Solr documentation for more information on
|
||||||
|
the different parameters, note that you don't need ``spatialContextFactory`` if
|
||||||
|
you are not using JTS)::
|
||||||
|
|
||||||
|
<types>
|
||||||
|
<!-- ... -->
|
||||||
|
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
|
||||||
|
spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
|
||||||
|
distErrPct="0.025"
|
||||||
|
maxDistErr="0.000009"
|
||||||
|
units="degrees"
|
||||||
|
/>
|
||||||
|
</types>
|
||||||
|
<fields>
|
||||||
|
<!-- ... -->
|
||||||
|
<field name="spatial_geom" type="location_rpt" indexed="true" stored="true" multiValued="true" />
|
||||||
|
</fields>
|
||||||
|
|
||||||
|
* ``postgis``
|
||||||
|
This is the original implementation of the spatial search. It does not
|
||||||
|
require any change in the Solr schema and can run on Solr 1.x, but it is
|
||||||
|
not as efficient as the previous ones. Basically the bounding box based
|
||||||
|
query is performed in PostGIS first, and the ids of the matched datasets
|
||||||
|
are added as a filter to the Solr request. This, apart from being much
|
||||||
|
less efficient, can led to issues on Solr due to size of the requests (See
|
||||||
|
`Solr configuration issues on legacy PostGIS backend`_). There is support
|
||||||
|
for a spatial ranking on this backend (setting
|
||||||
|
``ckanext.spatial.use_postgis_sorting`` to True on the ini file), but it
|
||||||
|
can not be combined with any other filtering.
|
||||||
|
|
||||||
|
|
||||||
|
.. _edismax: http://wiki.apache.org/solr/ExtendedDisMax
|
||||||
|
.. _JTS: http://www.vividsolutions.com/jts/JTSHome.htm
|
||||||
|
|
||||||
|
|
||||||
Geo-Indexing your datasets
|
Geo-Indexing your datasets
|
||||||
++++++++++++++++++++++++++
|
++++++++++++++++++++++++++
|
||||||
|
|
||||||
In order to make a dataset queryable by location, an special extra must
|
Regardless of the backend that you are using, in order to make a dataset
|
||||||
be defined, with its key named 'spatial'. The value must be a valid GeoJSON_
|
queryable by location, an special extra must be defined, with its key named
|
||||||
geometry, for example::
|
'spatial'. The value must be a valid GeoJSON_ geometry, for example::
|
||||||
|
|
||||||
{"type":"Polygon","coordinates":[[[2.05827, 49.8625],[2.05827, 55.7447], [-6.41736, 55.7447], [-6.41736, 49.8625], [2.05827, 49.8625]]]}
|
{"type":"Polygon","coordinates":[[[2.05827, 49.8625],[2.05827, 55.7447], [-6.41736, 55.7447], [-6.41736, 49.8625], [2.05827, 49.8625]]]}
|
||||||
|
|
||||||
|
@ -102,14 +189,43 @@ or with a GeoJSON object describing a bounding box (note the escaped quotes)::
|
||||||
|
|
||||||
{% snippet "spatial/snippets/spatial_query.html", default_extent="{ \"type\": \"Polygon\", \"coordinates\": [[[74.89, 29.39],[74.89, 38.45], [60.50, 38.45], [60.50, 29.39], [74.89, 29.39]]]}" %}
|
{% snippet "spatial/snippets/spatial_query.html", default_extent="{ \"type\": \"Polygon\", \"coordinates\": [[[74.89, 29.39],[74.89, 38.45], [60.50, 38.45], [60.50, 29.39], [74.89, 29.39]]]}" %}
|
||||||
|
|
||||||
You need to load the ``spatial_metadata`` and ``spatial_query`` plugins to use this snippet.
|
You need to load the `spatial_metadata` and `spatial_query` plugins to use this snippet.
|
||||||
|
|
||||||
|
|
||||||
|
Solr configuration issues on legacy PostGIS backend
|
||||||
|
+++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
If you find any of the issues described in this section it is strongly
|
||||||
|
suggested that you consider switching to one of the Solr based backends
|
||||||
|
which are much more efficient. These notes are just kept for informative
|
||||||
|
purposes.
|
||||||
|
|
||||||
|
|
||||||
|
If using Spatial Query functionality then there is an additional SOLR/Lucene setting that should be used to set the limit on number of datasets searchable with a spatial value.
|
||||||
|
|
||||||
|
The setting is ``maxBooleanClauses`` in the solrconfig.xml and the value is the number of datasets spatially searchable. The default is ``1024`` and this could be increased to say ``16384``. For a SOLR single core this will probably be at `/etc/solr/conf/solrconfig.xml`. For a multiple core set-up, there will me several solrconfig.xml files a couple of levels below `/etc/solr`. For that case, *all* of the cores' `solrconfig.xml` should have this setting at the new value.
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
|
<maxBooleanClauses>16384</maxBooleanClauses>
|
||||||
|
|
||||||
|
This setting is needed because PostGIS spatial query results are fed into SOLR using a Boolean expression, and the parser for that has a limit. So if your spatial area contains more than the limit (of which the default is 1024) then you will get this error::
|
||||||
|
|
||||||
|
Dataset search error: ('SOLR returned an error running query...
|
||||||
|
|
||||||
|
and in the SOLR logs you see::
|
||||||
|
|
||||||
|
too many boolean clauses
|
||||||
|
...
|
||||||
|
Caused by: org.apache.lucene.search.BooleanQuery$TooManyClauses:
|
||||||
|
maxClauseCount is set to 1024
|
||||||
|
|
||||||
|
|
||||||
Legacy API
|
Legacy API
|
||||||
++++++++++
|
++++++++++
|
||||||
|
|
||||||
.. note:: This API endpoint will be deprecated on future versions of CKAN and
|
|
||||||
should not be used.
|
|
||||||
|
|
||||||
The extension adds the following call to the CKAN search API, which returns
|
The extension adds the following call to the CKAN search API, which returns
|
||||||
datasets with an extent that intersects with the bounding box provided::
|
datasets with an extent that intersects with the bounding box provided::
|
||||||
|
|
||||||
|
|
|
@ -114,7 +114,7 @@ class SpatialHarvester(HarvesterBase):
|
||||||
force_import = False
|
force_import = False
|
||||||
|
|
||||||
extent_template = Template('''
|
extent_template = Template('''
|
||||||
{"type": "Polygon", "coordinates": [[[$xmin, $ymin], [$xmin, $ymax], [$xmax, $ymax], [$xmax, $ymin], [$xmin, $ymin]]]}
|
{"type": "Polygon", "coordinates": [[[$xmin, $ymin], [$xmax, $ymin], [$xmax, $ymax], [$xmin, $ymax], [$xmin, $ymin]]]}
|
||||||
''')
|
''')
|
||||||
|
|
||||||
## IHarvester
|
## IHarvester
|
||||||
|
|
|
@ -4,6 +4,8 @@ from logging import getLogger
|
||||||
|
|
||||||
from pylons import config
|
from pylons import config
|
||||||
|
|
||||||
|
import shapely
|
||||||
|
|
||||||
from ckan import plugins as p
|
from ckan import plugins as p
|
||||||
|
|
||||||
from ckan.lib.search import SearchError, PackageSearchQuery
|
from ckan.lib.search import SearchError, PackageSearchQuery
|
||||||
|
@ -118,6 +120,13 @@ class SpatialQuery(p.SingletonPlugin):
|
||||||
|
|
||||||
p.implements(p.IRoutes, inherit=True)
|
p.implements(p.IRoutes, inherit=True)
|
||||||
p.implements(p.IPackageController, inherit=True)
|
p.implements(p.IPackageController, inherit=True)
|
||||||
|
p.implements(p.IConfigurable, inherit=True)
|
||||||
|
|
||||||
|
search_backend = None
|
||||||
|
|
||||||
|
def configure(self, config):
|
||||||
|
|
||||||
|
self.search_backend = config.get('ckanext.spatial.search_backend', 'postgis')
|
||||||
|
|
||||||
def before_map(self, map):
|
def before_map(self, map):
|
||||||
|
|
||||||
|
@ -126,14 +135,145 @@ class SpatialQuery(p.SingletonPlugin):
|
||||||
action='spatial_query')
|
action='spatial_query')
|
||||||
return map
|
return map
|
||||||
|
|
||||||
|
def before_index(self, pkg_dict):
|
||||||
|
|
||||||
|
if pkg_dict.get('extras_spatial', None) and self.search_backend in ('solr', 'solr-spatial-field'):
|
||||||
|
try:
|
||||||
|
geometry = json.loads(pkg_dict['extras_spatial'])
|
||||||
|
except ValueError, e:
|
||||||
|
log.error('Geometry not valid GeoJSON, not indexing')
|
||||||
|
return pkg_dict
|
||||||
|
|
||||||
|
if self.search_backend == 'solr':
|
||||||
|
# Only bbox supported for this backend
|
||||||
|
if not (geometry['type'] == 'Polygon'
|
||||||
|
and len(geometry['coordinates']) == 1
|
||||||
|
and len(geometry['coordinates'][0]) == 5):
|
||||||
|
log.error('Solr backend only supports bboxes, ignoring geometry {0}'.format(pkg_dict['extras_spatial']))
|
||||||
|
return pkg_dict
|
||||||
|
|
||||||
|
coords = geometry['coordinates']
|
||||||
|
pkg_dict['maxy'] = max(coords[0][2][1], coords[0][0][1])
|
||||||
|
pkg_dict['miny'] = min(coords[0][2][1], coords[0][0][1])
|
||||||
|
pkg_dict['maxx'] = max(coords[0][2][0], coords[0][0][0])
|
||||||
|
pkg_dict['minx'] = min(coords[0][2][0], coords[0][0][0])
|
||||||
|
pkg_dict['bbox_area'] = (pkg_dict['maxx'] - pkg_dict['minx']) * \
|
||||||
|
(pkg_dict['maxy'] - pkg_dict['miny'])
|
||||||
|
|
||||||
|
elif self.search_backend == 'solr-spatial-field':
|
||||||
|
wkt = None
|
||||||
|
|
||||||
|
# Check potential problems with bboxes
|
||||||
|
if geometry['type'] == 'Polygon' \
|
||||||
|
and len(geometry['coordinates']) == 1 \
|
||||||
|
and len(geometry['coordinates'][0]) == 5:
|
||||||
|
|
||||||
|
# Check wrong bboxes (4 same points)
|
||||||
|
xs = [p[0] for p in geometry['coordinates'][0]]
|
||||||
|
ys = [p[1] for p in geometry['coordinates'][0]]
|
||||||
|
|
||||||
|
if xs.count(xs[0]) == 5 and ys.count(ys[0]) == 5:
|
||||||
|
wkt = 'POINT({x} {y})'.format(x=xs[0], y=ys[0])
|
||||||
|
else:
|
||||||
|
# Check if coordinates are defined counter-clockwise,
|
||||||
|
# otherwise we'll get wrong results from Solr
|
||||||
|
lr = shapely.geometry.polygon.LinearRing(geometry['coordinates'][0])
|
||||||
|
if not lr.is_ccw:
|
||||||
|
lr.coords = list(lr.coords)[::-1]
|
||||||
|
polygon = shapely.geometry.polygon.Polygon(lr)
|
||||||
|
wkt = polygon.wkt
|
||||||
|
|
||||||
|
if not wkt:
|
||||||
|
shape = shapely.geometry.asShape(geometry)
|
||||||
|
if not shape.is_valid:
|
||||||
|
log.error('Wrong geometry, not indexing')
|
||||||
|
return pkg_dict
|
||||||
|
wkt = shape.wkt
|
||||||
|
|
||||||
|
pkg_dict['spatial_geom'] = wkt
|
||||||
|
|
||||||
|
|
||||||
|
return pkg_dict
|
||||||
|
|
||||||
def before_search(self, search_params):
|
def before_search(self, search_params):
|
||||||
if 'extras' in search_params and 'ext_bbox' in search_params['extras'] \
|
if search_params.get('extras', None) and search_params['extras'].get('ext_bbox', None):
|
||||||
and search_params['extras']['ext_bbox']:
|
|
||||||
|
|
||||||
bbox = validate_bbox(search_params['extras']['ext_bbox'])
|
bbox = validate_bbox(search_params['extras']['ext_bbox'])
|
||||||
if not bbox:
|
if not bbox:
|
||||||
raise SearchError('Wrong bounding box provided')
|
raise SearchError('Wrong bounding box provided')
|
||||||
|
|
||||||
|
if self.search_backend == 'solr':
|
||||||
|
search_params = self._params_for_solr_search(bbox, search_params)
|
||||||
|
elif self.search_backend == 'solr-spatial-field':
|
||||||
|
search_params = self._params_for_solr_spatial_search(bbox, search_params)
|
||||||
|
elif self.search_backend == 'postgis':
|
||||||
|
search_params = self._params_for_postgis_search(bbox, search_params)
|
||||||
|
|
||||||
|
return search_params
|
||||||
|
|
||||||
|
def _params_for_solr_search(self, bbox, search_params):
|
||||||
|
'''
|
||||||
|
This will add the following parameters to the query:
|
||||||
|
|
||||||
|
defType - edismax (We need to define EDisMax to use bf)
|
||||||
|
bf - {function} A boost function to influence the score (thus
|
||||||
|
influencing the sorting). The algorithm can be basically defined as:
|
||||||
|
|
||||||
|
2 * X / Q + T
|
||||||
|
|
||||||
|
Where X is the intersection between the query area Q and the
|
||||||
|
target geometry T. It gives a ratio from 0 to 1 where 0 means
|
||||||
|
no overlap at all and 1 a perfect fit
|
||||||
|
|
||||||
|
fq - Adds a filter that force the value returned by the previous
|
||||||
|
function to be between 0 and 1, effectively applying the
|
||||||
|
spatial filter.
|
||||||
|
|
||||||
|
'''
|
||||||
|
|
||||||
|
variables =dict(
|
||||||
|
x11=bbox['minx'],
|
||||||
|
x12=bbox['maxx'],
|
||||||
|
y11=bbox['miny'],
|
||||||
|
y12=bbox['maxy'],
|
||||||
|
x21='minx',
|
||||||
|
x22='maxx',
|
||||||
|
y21='miny',
|
||||||
|
y22='maxy',
|
||||||
|
area_search = abs(bbox['maxx'] - bbox['minx']) * abs(bbox['maxy'] - bbox['miny'])
|
||||||
|
)
|
||||||
|
|
||||||
|
bf = '''div(
|
||||||
|
mul(
|
||||||
|
mul(max(0, sub(min({x12},{x22}) , max({x11},{x21}))),
|
||||||
|
max(0, sub(min({y12},{y22}) , max({y11},{y21})))
|
||||||
|
),
|
||||||
|
2),
|
||||||
|
add({area_search}, mul(sub({y22}, {y21}), sub({x22}, {x21})))
|
||||||
|
)'''.format(**variables).replace('\n','').replace(' ','')
|
||||||
|
|
||||||
|
search_params['fq_list'] = ['{!frange incl=false l=0 u=1}%s' % bf]
|
||||||
|
|
||||||
|
search_params['bf'] = bf
|
||||||
|
search_params['defType'] = 'edismax'
|
||||||
|
|
||||||
|
return search_params
|
||||||
|
|
||||||
|
def _params_for_solr_spatial_field_search(self, bbox, search_params):
|
||||||
|
'''
|
||||||
|
This will add an fq filter with the form:
|
||||||
|
|
||||||
|
+spatial_geom:"Intersects({minx} {miny} {maxx} {maxy})
|
||||||
|
|
||||||
|
'''
|
||||||
|
search_params['fq_list'] = search_params.get('fq_list', [])
|
||||||
|
search_params['fq_list'].append('+spatial_geom:"Intersects({minx} {miny} {maxx} {maxy})"'
|
||||||
|
.format(minx=bbox['minx'],miny=bbox['miny'],maxx=bbox['maxx'],maxy=bbox['maxy']))
|
||||||
|
|
||||||
|
return search_params
|
||||||
|
|
||||||
|
def _params_for_postgis_search(self, bbox, search_params):
|
||||||
|
|
||||||
# Note: This will be deprecated at some point in favour of the
|
# Note: This will be deprecated at some point in favour of the
|
||||||
# Solr 4 spatial sorting capabilities
|
# Solr 4 spatial sorting capabilities
|
||||||
if search_params['sort'] == 'spatial desc' and \
|
if search_params['sort'] == 'spatial desc' and \
|
||||||
|
|
Loading…
Reference in New Issue