Integration¶
Ce chapitre detaille l'instrumentation de chaque service du catalogue pour alimenter la stack LGTM en metriques, logs et traces.
Vue d'ensemble des integrations¶
| Service | Metriques | Logs | Traces | Dashboard |
|---|---|---|---|---|
| Keycloak | Endpoint /metrics natif | Logs JSON stdout | OTel SDK Java | Keycloak Overview |
| CoreDNS | Endpoint /metrics natif | Logs structures | Non (pas de traces DNS) | CoreDNS Overview |
| CI/CD (Gitea/ArgoCD) | Endpoints /metrics | Logs JSON | OTel spans (pipelines) | CI/CD Pipelines |
| Applications metier | OTel SDK | Logs JSON structures | OTel SDK (auto-instrumentation) | Service Overview |
| Infrastructure (nodes) | node_exporter | Journal systemd | Non | Node Overview |
| Conteneurs (Podman/K8s) | cAdvisor / kubelet | Logs conteneurs | Non | Container Overview |
Metriques Keycloak¶
Keycloak expose des metriques Prometheus nativement :
# Configuration Keycloak pour exposer les metriques
# Variable d'environnement ou argument de demarrage
KC_METRICS_ENABLED: "true"
KC_HEALTH_ENABLED: "true"
# Scrape config Prometheus / Alloy
scrape_configs:
- job_name: "keycloak"
metrics_path: /metrics
scheme: https
tls_config:
insecure_skip_verify: false
ca_file: /etc/prometheus/ca.pem
static_configs:
- targets: ["keycloak.entreprise:8443"]
relabel_configs:
- source_labels: [__name__]
regex: "keycloak_(login|token|registration)_.*"
action: keep
Metriques cles a surveiller :
| Metrique | Type | Signification |
|---|---|---|
keycloak_logins_total | Counter | Nombre total de connexions |
keycloak_login_errors_total | Counter | Echecs d'authentification |
keycloak_registrations_total | Counter | Inscriptions |
keycloak_request_duration_seconds | Histogram | Latence des requêtes |
keycloak_active_sessions | Gauge | Sessions actives |
# Taux d'echec d'authentification (alerte si > 10%)
rate(keycloak_login_errors_total[5m])
/
rate(keycloak_logins_total[5m])
> 0.1
Metriques CoreDNS¶
CoreDNS expose des metriques via le plugin prometheus :
# Corefile — activer les metriques
.:53 {
prometheus :9153
forward . 8.8.8.8 8.8.4.4
cache 30
log
errors
}
Metriques cles :
| Metrique | Type | Signification |
|---|---|---|
coredns_dns_requests_total | Counter | Requêtes DNS par type (A, AAAA, CNAME) |
coredns_dns_responses_total | Counter | Réponses par code (NOERROR, NXDOMAIN, SERVFAIL) |
coredns_dns_request_duration_seconds | Histogram | Latence de résolution |
coredns_cache_hits_total | Counter | Hits du cache DNS |
coredns_cache_misses_total | Counter | Misses du cache DNS |
# RED pour CoreDNS
# Rate
sum(rate(coredns_dns_requests_total[5m]))
# Errors
sum(rate(coredns_dns_responses_total{rcode="SERVFAIL"}[5m]))
# Duration (p99)
histogram_quantile(0.99, sum by(le) (rate(coredns_dns_request_duration_seconds_bucket[5m])))
Metriques CI/CD¶
Gitea¶
# Scrape config
- job_name: "gitea"
metrics_path: /metrics
bearer_token_file: /etc/prometheus/gitea-token
static_configs:
- targets: ["gitea.chaine-logicielle:3000"]
ArgoCD¶
# ArgoCD expose des metriques sur 3 endpoints
- job_name: "argocd-server"
static_configs:
- targets: ["argocd-server-metrics.cicd:8083"]
- job_name: "argocd-repo-server"
static_configs:
- targets: ["argocd-repo-server.cicd:8084"]
- job_name: "argocd-app-controller"
static_configs:
- targets: ["argocd-application-controller-metrics.cicd:8082"]
Metriques cles CI/CD :
| Metrique | Signification |
|---|---|
argocd_app_sync_total | Nombre de synchronisations |
argocd_app_health_status | Sante des applications |
gitea_builds_total | Nombre de builds |
| Pipeline duration | Duree des pipelines |
Logs applicatifs structures¶
Format standard JSON¶
Tous les services doivent emettre des logs en JSON avec un schema coherent :
{
"timestamp": "2026-04-16T14:23:45.123Z",
"level": "info",
"service": "catalog-api",
"version": "1.4.2",
"environment": "production",
"trace_id": "abc123def456",
"span_id": "789ghi012",
"message": "requete traitee",
"http": {
"method": "GET",
"path": "/api/v1/products",
"status_code": 200,
"duration_ms": 42,
"client_ip": "10.20.1.45"
}
}
Configuration Alloy pour la collecte des logs¶
// Collecte des logs de conteneurs Kubernetes
loki.source.kubernetes "pods" {
targets = discovery.kubernetes.pods.targets
forward_to = [loki.process.default.receiver]
}
// Pipeline de traitement des logs
loki.process "default" {
// Parser les logs JSON
stage.json {
expressions = {
level = "level",
service = "service",
trace_id = "trace_id",
timestamp = "timestamp",
}
}
// Ajouter les labels
stage.labels {
values = {
level = "",
service = "",
}
}
// Filtrer les PII (voir chapitre Confidentialite)
stage.replace {
expression = "(password|token|secret)=\\S+"
replace = "${1}=***REDACTED***"
}
forward_to = [loki.write.default.receiver]
}
loki.write "default" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
}
}
Labels Loki
Limiter le nombre de labels Loki (< 15 par flux). Les labels a haute cardinalite (user_id, request_id) ne doivent jamais etre des labels Loki — utiliser le filtrage dans le contenu du log via | json | user_id="xxx".
Traces distribuees¶
Auto-instrumentation avec OpenTelemetry¶
Pour une application Java (Spring Boot) :
# Dockerfile — ajout de l'agent OTel
FROM eclipse-temurin:21-jre
COPY app.jar /app.jar
ADD https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar /otel-agent.jar
ENV JAVA_TOOL_OPTIONS="-javaagent:/otel-agent.jar"
ENV OTEL_SERVICE_NAME="catalog-api"
ENV OTEL_EXPORTER_OTLP_ENDPOINT="http://alloy:4317"
ENV OTEL_TRACES_SAMPLER="parentbased_traceidratio"
ENV OTEL_TRACES_SAMPLER_ARG="0.1"
ENTRYPOINT ["java", "-jar", "/app.jar"]
Pour une application Python (Flask/FastAPI) :
# Demarrage avec auto-instrumentation
OTEL_SERVICE_NAME=user-service \
OTEL_EXPORTER_OTLP_ENDPOINT=http://alloy:4317 \
OTEL_TRACES_SAMPLER=parentbased_traceidratio \
OTEL_TRACES_SAMPLER_ARG=0.1 \
opentelemetry-instrument python app.py
Pour une application Go :
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/sdk/trace"
)
func initTracer() (*trace.TracerProvider, error) {
exporter, err := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint("alloy:4317"),
otlptracegrpc.WithInsecure(),
)
if err != nil {
return nil, err
}
tp := trace.NewTracerProvider(
trace.WithBatcher(exporter),
trace.WithSampler(trace.ParentBased(trace.TraceIDRatioBased(0.1))),
)
otel.SetTracerProvider(tp)
return tp, nil
}
Dashboards Grafana par service¶
Structure des dossiers Grafana¶
Grafana Dashboards/
├── Infrastructure/
│ ├── Node Overview
│ ├── Container Overview
│ └── MinIO Overview
├── Services de production/
│ ├── Keycloak Overview
│ ├── CoreDNS Overview
│ └── Observabilite (meta-monitoring)
├── Chaine logicielle/
│ ├── Gitea Overview
│ ├── ArgoCD Overview
│ └── CI/CD Pipelines
└── Applications/
├── Service Overview (template)
└── [Par application metier]
Dashboard template "Service Overview"¶
Chaque service doit avoir un dashboard avec les golden signals :
| Panel | Requête PromQL | Type |
|---|---|---|
| Request rate | sum(rate(http_requests_total{service="$service"}[5m])) | Graph |
| Error rate | sum(rate(http_requests_total{service="$service",status=~"5.."}[5m])) / sum(rate(http_requests_total{service="$service"}[5m])) | Gauge |
| Latency p99 | histogram_quantile(0.99, sum by(le) (rate(http_request_duration_seconds_bucket{service="$service"}[5m]))) | Graph |
| Saturation | process_resident_memory_bytes{service="$service"} / container_spec_memory_limit_bytes{service="$service"} | Gauge |
| Recent logs | {service="$service"} \| json \| level=~"error\|warn" | Logs panel |
| Active traces | Lien vers Tempo avec filtre service.name="$service" | Link |
Dashboards as code
Les dashboards Grafana doivent etre versionnees dans Git (JSON export) et provisionnees automatiquement via les ConfigMaps Kubernetes ou le provisioning Grafana. Ne jamais créer de dashboard uniquement via l'interface.