Skip to content

Architecture

Source Layout

skipper/
├── Dockerfile                          GraalVM multi-stage build (native binary → debian:12-slim)
├── pom.xml
├── helm/naftiko-skipper/               Helm chart
│   ├── crds/                           CRD YAML installed by Helm
│   └── templates/                      Operator Deployment, RBAC, ServiceAccount
├── config/
│   ├── crds/                           CRD definitions — managed by ArgoCD (wave -1)
│   │   ├── application.yaml
│   │   └── manifests/
│   │       ├── naftiko-capability-crd.yaml
│   │       └── capability-class-crd.yaml
│   ├── defaults/                       Default CapabilityClass instances — managed by ArgoCD (wave 0)
│   │   ├── application.yaml
│   │   └── manifests/
│   │       ├── capability-class-standard.yaml
│   │       ├── capability-class-premium.yaml
│   │       └── capability-class-dev.yaml
│   ├── operator/                       Operator ArgoCD Application (wave 1)
│   │   └── application.yaml
│   ├── capabilities/                   ApplicationSet template for user capabilities
│   │   └── applicationset.template.yaml
│   └── samples/                        Sample Capability CRs for testing and CI
└── src/main/java/io/naftiko/operator/
    ├── NaftikoOperator.java            Main entry point
    ├── CapabilityReconciler.java       Core reconciliation logic
    └── crd/
        ├── CapabilityResource.java
        ├── CapabilitySpec.java
        ├── CapabilityStatus.java
        ├── CapabilityClassResource.java
        └── CapabilityClassSpec.java

Reconciliation Flow


Core Components

Capability CRD

The Capability resource is the primary user-facing API.

Users submit a capability specification through: - a referenced ConfigMap (recommended — specRef pattern) - labels describing tier/domain metadata - exposed REST, MCP, skill, or control endpoints

Example:

apiVersion: naftiko.io/v1alpha3
kind: Capability
metadata:
  name: hello-world
spec:
  specRef:
    configMap: hello-world-spec

CapabilityClass

CapabilityClass defines operational defaults per resource tier: - CPU requests/limits - memory requests/limits - HPA autoscaling configuration - Resilience4j defaults (circuit breaker, retry, bulkhead, rate limiter)

Selection is driven by:

info.labels["naftiko.io/tier"]

Three tiers ship by default: standard, premium, dev. If no matching class is found, the operator falls back to built-in standard defaults.


CapabilityReconciler

The reconciler is the core controller loop.

Responsibilities: - watch Capability resources - resolve the CapabilityClass - resolve bind secrets and import mounts - generate all child resources - maintain drift correction - patch status

Reconciliation flow:

observe → compare → reconcile → patch status

Generated Resources

For each Capability CR, Skipper creates and continuously reconciles:

ConfigMap

Stores the full capability specification verbatim at key capability.yaml. Mounted into the engine pod at /data/capability.yaml using subPath so that import file mounts under /data/ can coexist without conflict.

Deployment

Skipper generates a Deployment running:

ghcr.io/naftiko/ikanos:latest

The Deployment: - mounts the spec ConfigMap at /data/capability.yaml - mounts bind secrets at the declared file:// path (e.g. /app/shared/) - mounts import ConfigMaps as individual files under /data/shared/ - injects OpenTelemetry env vars (OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, etc.) - exposes one named container port per expose entry (mcp, rest, skill, control) - adds Prometheus scrape annotations when a type: control expose is declared

Service

A ClusterIP Service is created with one named port per expose entry

ports:
  - name: mcp       # port 3001
  - name: rest      # port 3002
  - name: skill     # port 3003
  - name: control   # port 9090

Ingress

Ingress creation is conditional. Generated only when:

exposes[].tags contains "public"

ServiceMonitor

A Prometheus Operator ServiceMonitor is created automatically when a type: control expose is declared. It targets the control named port at /metrics with a 15-second scrape interval.

If Prometheus Operator CRDs are not installed, the operator logs a warning and falls back gracefully — Prometheus pod annotations (prometheus.io/scrape, prometheus.io/port, prometheus.io/path) are always written on the pod template regardless.


Runtime Flow

User applies Capability CR
Operator receives event
Reconciler resolves spec
        ├─ reads specRef ConfigMap (verbatim, no re-serialization)
        ├─ resolves CapabilityClass tier
        ├─ resolves bind secrets  →  /app/shared/secrets.yaml
        └─ resolves import mounts →  /data/shared/*.yaml
Reconciler generates child resources
        ├─ create/update ConfigMap
        ├─ create/update Deployment  (all ports + OTEL env vars)
        ├─ create/update Service     (all named ports)
        ├─ reconcile Ingress         (only if "public" tag)
        └─ reconcile ServiceMonitor  (only if control port)
Engine pod starts
        ├─ reads /data/capability.yaml
        ├─ loads imports from /data/shared/
        ├─ resolves secrets from /app/shared/
        └─ starts adapters on declared ports
Capability endpoints become available

Multi-Port Support

A capability can expose multiple adapters simultaneously, each on its own port. Skipper generates a named Service port and container port for every entry in exposes[].

capability:
  exposes:
    - type: mcp     # → Service port "mcp":3001,  container port 3001
      port: 3001
    - type: rest    # → Service port "rest":3002,  container port 3002
      port: 3002
    - type: skill   # → Service port "skill":3003, container port 3003
      port: 3003
    - type: control # → Service port "control":9090, container port 9090
      port: 9090    #   + ServiceMonitor + pod annotations

Bind Secret Convention

Multiple bind namespaces that share the same location path are backed by one Kubernetes Secret containing all keys combined.

Secret naming: {capability-name}-bind-{parent-directory}

file:///./shared/secrets.yaml  →  {name}-bind-shared
file:///./config/db.yaml       →  {name}-bind-config

This avoids projected volume conflicts when two namespaces point to the same file — two secrets with the same key cannot be merged by Kubernetes projected volumes without one silently overwriting the other.


Import Consumes Convention

capability.consumes entries using the import + location pattern are backed by a dedicated ConfigMap per import.

ConfigMap naming: {capability-name}-import-{import-namespace}

consumes:
  - import: registry
    location: ./shared/step7-registry-consumes.yml

Expects: {name}-import-registry with key step7-registry-consumes.yml.

Each import file is mounted individually via subPath so that multiple imports sharing the same parent directory do not conflict.


Reconciliation Model

Skipper follows the Kubernetes controller pattern.

The operator continuously ensures:

desired state == actual cluster state

If resources drift — deleted Service, modified Deployment, outdated ConfigMap — the reconcile loop restores consistency automatically.

Resources created by Skipper carry OwnerReference pointing to the Capability CR. When the CR is deleted, Kubernetes garbage-collects all child resources.


OTEL Env Var Injection

The operator injects OpenTelemetry environment variables into every engine pod. Configuration is driven by env vars on the operator Deployment itself:

Operator env var Injected into pod as Purpose
NAFTIKO_ENGINE_IMAGE (image field) ikanos image to run
NAFTIKO_OTEL_ENDPOINT OTEL_EXPORTER_OTLP_ENDPOINT OTLP collector endpoint
NAFTIKO_OTEL_PROTOCOL OTEL_EXPORTER_OTLP_PROTOCOL grpc or http/protobuf
NAFTIKO_OTEL_HEADERS OTEL_EXPORTER_OTLP_HEADERS e.g. DD-API-KEY=xxx
NAFTIKO_OTEL_SAMPLING_RATE OTEL_TRACES_SAMPLER_ARG sampling rate 0.0–1.0

OTEL_SERVICE_NAME is always set to naftiko-{capability-name} regardless of operator config.


Native Binary Architecture

The operator is compiled as a GraalVM native executable.

Benefits: - low memory footprint - fast startup - reduced JVM overhead - improved Kubernetes density

Container strategy:

build stage  → GraalVM native-image
runtime      → debian:12-slim

Failure Handling

If reconciliation fails: - status.phase becomes Failed - conditions are updated with the error message - the reconcile loop retries automatically with exponential backoff

Successful reconciliation sets:

status.phase = Running
status.endpoint = http://{name}.{namespace}.svc.cluster.local:{primaryPort}

Design Principles

Kubernetes-native

Skipper delegates runtime lifecycle management to Kubernetes primitives. The operator only manages desired state — it does not interpret capability business logic.

Declarative Model

Users declare what a capability is and how it should be exposed. Skipper determines how resources are materialized.

Loose Coupling

The engine container (ikanos) is responsible for parsing the spec, serving APIs, and executing runtime behaviour. Skipper only orchestrates Kubernetes resources and is agnostic of the ikanos version.

Spec Fidelity

When specRef is used, the raw YAML from the referenced ConfigMap is written verbatim into the generated ConfigMap — no re-serialization through the Java model. Unknown fields (MCP tools, descriptions, prompts, aggregates) are preserved exactly as authored.