Go to file
Francesco Mangiacrapa 0fe6149bdb updated the doc 2026-06-08 15:19:38 +02:00
.vscode created the mcp server for the geoportal 2026-05-27 11:09:54 +02:00
ai added skills and prompts 2026-05-29 17:34:40 +02:00
docker updated the doc 2026-06-08 15:19:38 +02:00
docs updated the docs 2026-06-05 11:44:34 +02:00
plans added the landing-page 2026-06-04 17:36:05 +02:00
src updated the doc 2026-06-08 15:19:38 +02:00
tests uses the local rag 2026-05-28 15:09:51 +02:00
.dockerignore created the mcp server for the geoportal 2026-05-27 11:09:54 +02:00
.env.example updated the landing page 2026-06-04 17:51:51 +02:00
.gitignore updated the documentation 2026-06-03 15:46:18 +02:00
.mcp.json created the mcp server for the geoportal 2026-05-27 11:09:54 +02:00
DEVELOPER.md updated the landing page 2026-06-04 17:51:51 +02:00
Dockerfile added the landing-page 2026-06-04 17:36:05 +02:00
README.md updated the README 2026-06-05 16:46:38 +02:00
docker-compose.yml uses the local rag 2026-05-28 15:09:51 +02:00
pyproject.toml updated the logic 2026-05-27 17:50:53 +02:00

README.md

Geoportal MCP Server

Python MCP server (Docker-first) for Geoportal read-only APIs with OIDC Client Credentials authentication. It provides a robust RAG pipeline (Qdrant + Ollama) and advanced analytics for Geoportal project data.


Table of Contents


🚀 Quick Start (Docker Compose)

The fastest way to run the full stack (MCP Server + Qdrant + Ollama).

  1. Configure environment:
    cp .env.example .env
    # Edit .env with your OIDC credentials
    
  2. Start the stack:
    docker compose up --build -d
    
  3. Watch logs:
    docker compose logs -f geoportal-mcp-server
    

The server starts with RAG_BOOTSTRAP_MODE=sync by default. If the collection is empty, it will perform a full ingest.


Features

  • Standardized Auth: OIDC Client Credentials with Authlib and pre-expiry token refresh.
  • RAG Pipeline: Complete lifecycle from fetch to geosemantic search.
  • Performance: Optional full-parallel ingest with worker pools and failure tracking.
  • Observability: Structured logs for tool invocation and detailed embedding profiling.
  • Analytics: Quality scoring, profile comparison, and collection health reports.
  • AI Ready: Pre-authored prompts and skills for AI agents (Claude, Gemini, etc.).

🏗️ Architecture & Layout

For a detailed view of the system components and their interactions, see docs/ARCHITECTURE.md.

Startup Workflow

flowchart TD
  A[docker compose up] --> B[Start qdrant service]
  A --> C[Start ollama service]
  C --> D[ollama-model-init pulls model]
  B --> E[Start geoportal-mcp-server]
  D --> E
  E --> F{RAG_BOOTSTRAP_MODE}
  F -->|sync| G[Incremental Sync]
  F -->|ingest| H[Full Ingest]
  F -->|disabled| I[Skip]
  G --> J[Start MCP Server]
  H --> J
  J --> K[Ready on stdio/http]

Project Layout

  • src/geoportal_mcp/config.py: Pydantic settings from environment.
  • src/geoportal_mcp/auth.py: OIDC token provider (D4Science compatible).
  • src/geoportal_mcp/rag/: Core RAG logic (normalize, embeddings, qdrant).
  • src/geoportal_mcp/rag_cli.py: RAG administration CLI.
  • src/geoportal_mcp/rag_profile_cli.py: RAG profile benchmarking and management CLI.
  • src/geoportal_mcp/server.py: MCP tool definitions and server bootstrap.
  • ai/: Source of truth for agent prompts and skills.

⚙️ Configuration

Use .env as the primary configuration method. See .env.example for the full list of variables and detailed descriptions.

Key Categories

  1. Connection: GEOPORTAL_API_BASE_URL, GEOPORTAL_SERVICE_URL.
  2. Auth: OIDC_* variables (support for D4Science UMA and generic OIDC).
  3. RAG: RAG_DATA_DIR, RAG_CHUNK_SIZE, RAG_BOOTSTRAP_MODE.
  4. Parallelism: RAG_FULL_PARALLEL_ENABLED, RAG_PARALLEL_WORKERS.
  5. Backends: QDRANT_URL, RAG_EMBEDDING_URL.

🧠 Local RAG Pipeline

The server manages a local vector index of Geoportal projects to enable semantic search.

CLI Management (geoportal-mcp-rag and geoportal-mcp-rag-profile)

Command Description
geoportal-mcp-rag ingest Full data fetch and re-index.
geoportal-mcp-rag sync Incremental update based on content hash.
geoportal-mcp-rag query Test semantic search from CLI.
geoportal-mcp-rag reset Drop the Qdrant collection.
geoportal-mcp-rag status Show collection and backend health.
geoportal-mcp-rag enrich-gis-links Manually trigger GIS link resolution.
geoportal-mcp-rag-profile query Test queries across different embedding profiles.

Parallel Ingest

Enable RAG_FULL_PARALLEL_ENABLED=true to process enrich, normalize, embed, and upsert tasks in parallel workers.

  • Failures are logged to a thread-safe JSONL file at <RAG_DATA_DIR>/state/parallel-failures.jsonl.
  • Control worker count with RAG_PARALLEL_WORKERS.

🛠️ MCP Tools

Standard Tools

  • health_check: Server status.
  • get_use_case_descriptors: Fetch available UCDs.
  • list_projects / get_project: Basic project exploration.
  • rag_status: Check indexing and backend availability.
  • geosemantic_search: Semantic search with geographic filters (bbox, radius).

Analytics Tools

Computed directly from indexed RAG payloads:

  • analytics_collection_summary: Volume, distribution, geo coverage.
  • analytics_quality_report: Weighted quality scores per document.
  • analytics_compare_profiles: Compare different profile IDs by richness and recency.

🤖 AI Agent Integration

All agent prompts and skills are authored in ai/ and synchronized to client folders (e.g., .claude/).

To sync changes:

python3 ai/adapters/sync_prompts_skills.py

Guided Prompts

Templates available for specialized analysis:

  • geoportal-quality-review.prompt.md
  • geoportal-profile-comparison.prompt.md
  • geoportal-geo-coverage-review.prompt.md

💻 Local Development (without Docker)

  1. Recreate environment:
    rm -rf .venv
    python3 -m venv .venv
    source .venv/bin/activate
    pip install -e .[dev]
    
  2. Run in HTTP mode:
    MCP_TRANSPORT=streamable-http geoportal-mcp-server
    
    Endpoints available at: /, /mcp, /healthz, /status.

🧪 Testing

# Run all tests
pytest

# Run with explanatory logs (recommended for debugging)
pytest -o log_cli=true --log-cli-level=INFO

Main tests:

  • tests/test_auth.py: Token cache and UMA flow validation.
  • tests/test_geoportal_client.py: API response handling.
  • tests/rag/: Coverage for pipeline, storage, and embeddings.

📝 Notes

  • Security: Secrets must come from the environment. /mcp can be protected via MCP_ACCESS_TOKEN.
  • D4Science: Use OIDC_GRANT_TYPE=uma-ticket for service account compatibility.
  • Ollama: For maximum parallelism, set OLLAMA_NUM_PARALLEL to match your worker count.

👤 Authors

  • Francesco Mangiacrapa (ORCID) Istituto di Scienza e Tecnologie dell'Informazione 'A. Faedo', Consiglio Nazionale delle Ricerche, Pisa, Italy
  • AI-assisted development:
    • OpenAI ChatGPT
    • Google Gemini
    • Claude