|
|
||
|---|---|---|
| .vscode | ||
| ai | ||
| docker | ||
| docs | ||
| plans | ||
| src | ||
| tests | ||
| .dockerignore | ||
| .env.example | ||
| .gitignore | ||
| .mcp.json | ||
| DEVELOPER.md | ||
| Dockerfile | ||
| README.md | ||
| docker-compose.yml | ||
| pyproject.toml | ||
README.md
Geoportal MCP Server
Python MCP server (Docker-first) for Geoportal read-only APIs with OIDC Client Credentials authentication. It provides a robust RAG pipeline (Qdrant + Ollama) and advanced analytics for Geoportal project data.
Table of Contents
- 🚀 Quick Start (Docker Compose)
- ✨ Features
- 🏗️ Architecture & Layout
- ⚙️ Configuration
- 🧠 Local RAG Pipeline
- 🛠️ MCP Tools
- 🤖 AI Agent Integration
- 💻 Local Development (without Docker)
- 🧪 Testing
- 📝 Notes
- 👤 Authors
🚀 Quick Start (Docker Compose)
The fastest way to run the full stack (MCP Server + Qdrant + Ollama).
- Configure environment:
cp .env.example .env # Edit .env with your OIDC credentials - Start the stack:
docker compose up --build -d - Watch logs:
docker compose logs -f geoportal-mcp-server
The server starts with RAG_BOOTSTRAP_MODE=sync by default. If the collection is empty, it will perform a full ingest.
✨ Features
- Standardized Auth: OIDC Client Credentials with Authlib and pre-expiry token refresh.
- RAG Pipeline: Complete lifecycle from fetch to geosemantic search.
- Performance: Optional full-parallel ingest with worker pools and failure tracking.
- Observability: Structured logs for tool invocation and detailed embedding profiling.
- Analytics: Quality scoring, profile comparison, and collection health reports.
- AI Ready: Pre-authored prompts and skills for AI agents (Claude, Gemini, etc.).
🏗️ Architecture & Layout
For a detailed view of the system components and their interactions, see docs/ARCHITECTURE.md.
Startup Workflow
flowchart TD
A[docker compose up] --> B[Start qdrant service]
A --> C[Start ollama service]
C --> D[ollama-model-init pulls model]
B --> E[Start geoportal-mcp-server]
D --> E
E --> F{RAG_BOOTSTRAP_MODE}
F -->|sync| G[Incremental Sync]
F -->|ingest| H[Full Ingest]
F -->|disabled| I[Skip]
G --> J[Start MCP Server]
H --> J
J --> K[Ready on stdio/http]
Project Layout
src/geoportal_mcp/config.py: Pydantic settings from environment.src/geoportal_mcp/auth.py: OIDC token provider (D4Science compatible).src/geoportal_mcp/rag/: Core RAG logic (normalize, embeddings, qdrant).src/geoportal_mcp/rag_cli.py: RAG administration CLI.src/geoportal_mcp/rag_profile_cli.py: RAG profile benchmarking and management CLI.src/geoportal_mcp/server.py: MCP tool definitions and server bootstrap.ai/: Source of truth for agent prompts and skills.
⚙️ Configuration
Use .env as the primary configuration method. See .env.example for the full list of variables and detailed descriptions.
Key Categories
- Connection:
GEOPORTAL_API_BASE_URL,GEOPORTAL_SERVICE_URL. - Auth:
OIDC_*variables (support for D4Science UMA and generic OIDC). - RAG:
RAG_DATA_DIR,RAG_CHUNK_SIZE,RAG_BOOTSTRAP_MODE. - Parallelism:
RAG_FULL_PARALLEL_ENABLED,RAG_PARALLEL_WORKERS. - Backends:
QDRANT_URL,RAG_EMBEDDING_URL.
🧠 Local RAG Pipeline
The server manages a local vector index of Geoportal projects to enable semantic search.
CLI Management (geoportal-mcp-rag and geoportal-mcp-rag-profile)
| Command | Description |
|---|---|
geoportal-mcp-rag ingest |
Full data fetch and re-index. |
geoportal-mcp-rag sync |
Incremental update based on content hash. |
geoportal-mcp-rag query |
Test semantic search from CLI. |
geoportal-mcp-rag reset |
Drop the Qdrant collection. |
geoportal-mcp-rag status |
Show collection and backend health. |
geoportal-mcp-rag enrich-gis-links |
Manually trigger GIS link resolution. |
geoportal-mcp-rag-profile query |
Test queries across different embedding profiles. |
Parallel Ingest
Enable RAG_FULL_PARALLEL_ENABLED=true to process enrich, normalize, embed, and upsert tasks in parallel workers.
- Failures are logged to a thread-safe JSONL file at
<RAG_DATA_DIR>/state/parallel-failures.jsonl. - Control worker count with
RAG_PARALLEL_WORKERS.
🛠️ MCP Tools
Standard Tools
health_check: Server status.get_use_case_descriptors: Fetch available UCDs.list_projects/get_project: Basic project exploration.rag_status: Check indexing and backend availability.geosemantic_search: Semantic search with geographic filters (bbox, radius).
Analytics Tools
Computed directly from indexed RAG payloads:
analytics_collection_summary: Volume, distribution, geo coverage.analytics_quality_report: Weighted quality scores per document.analytics_compare_profiles: Compare different profile IDs by richness and recency.
🤖 AI Agent Integration
All agent prompts and skills are authored in ai/ and synchronized to client folders (e.g., .claude/).
To sync changes:
python3 ai/adapters/sync_prompts_skills.py
Guided Prompts
Templates available for specialized analysis:
geoportal-quality-review.prompt.mdgeoportal-profile-comparison.prompt.mdgeoportal-geo-coverage-review.prompt.md
💻 Local Development (without Docker)
- Recreate environment:
rm -rf .venv python3 -m venv .venv source .venv/bin/activate pip install -e .[dev] - Run in HTTP mode:
Endpoints available at:MCP_TRANSPORT=streamable-http geoportal-mcp-server/,/mcp,/healthz,/status.
🧪 Testing
# Run all tests
pytest
# Run with explanatory logs (recommended for debugging)
pytest -o log_cli=true --log-cli-level=INFO
Main tests:
tests/test_auth.py: Token cache and UMA flow validation.tests/test_geoportal_client.py: API response handling.tests/rag/: Coverage for pipeline, storage, and embeddings.
📝 Notes
- Security: Secrets must come from the environment.
/mcpcan be protected viaMCP_ACCESS_TOKEN. - D4Science: Use
OIDC_GRANT_TYPE=uma-ticketfor service account compatibility. - Ollama: For maximum parallelism, set
OLLAMA_NUM_PARALLELto match your worker count.
👤 Authors
- Francesco Mangiacrapa (ORCID) Istituto di Scienza e Tecnologie dell'Informazione 'A. Faedo', Consiglio Nazionale delle Ricerche, Pisa, Italy
- AI-assisted development:
- OpenAI ChatGPT
- Google Gemini
- Claude