Monitoring Stack (Production Ready) 📊
Modern production ortamlarında Prometheus + Grafana + Exporters kombinasyonu endüstri standardıdır. Bu template ile tam bir monitoring altyapısı kurabilirsiniz.
🏗️ Klasör Yapısı
monitoring/
├── docker-compose.yml
├── prometheus/
│ ├── prometheus.yml
│ ├── alerts.yml
│ └── data/
├── grafana/
│ ├── provisioning/
│ │ ├── datasources/
│ │ │ └── prometheus.yml
│ │ └── dashboards/
│ │ ├── dashboard.yml
│ │ └── node-exporter.json
│ └── data/
├── alertmanager/
│ ├── config.yml
│ └── data/
└── loki/
├── config.yml
└── data/
🐳 Docker Compose Dosyası
version: "3.8"
services:
# Prometheus - Metrics toplama
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: always
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=30d"
- "--web.console.libraries=/etc/prometheus/console_libraries"
- "--web.console.templates=/etc/prometheus/consoles"
- "--web.enable-lifecycle"
ports:
- "127.0.0.1:9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
- ./prometheus/data:/prometheus
networks:
- monitoring
mem_limit: 1g
cpus: 1.0
# Grafana - Görselleştirme
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: always
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin123 # ÖNEMLİ: Değiştirin!
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_ROOT_URL=http://monitoring.example.com
- GF_INSTALL_PLUGINS=grafana-piechart-panel
ports:
- "127.0.0.1:3000:3000"
volumes:
- ./grafana/data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
networks:
- monitoring
depends_on:
- prometheus
mem_limit: 512m
# Node Exporter - Sistem metrikleri
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: always
command:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
- "--path.rootfs=/rootfs"
- "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
ports:
- "127.0.0.1:9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
networks:
- monitoring
pid: host
# cAdvisor - Container metrikleri
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
restart: always
ports:
- "127.0.0.1:8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
networks:
- monitoring
privileged: true
devices:
- /dev/kmsg
# Alertmanager - Alarm yönetimi
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: always
command:
- "--config.file=/etc/alertmanager/config.yml"
- "--storage.path=/alertmanager"
ports:
- "127.0.0.1:9093:9093"
volumes:
- ./alertmanager/config.yml:/etc/alertmanager/config.yml:ro
- ./alertmanager/data:/alertmanager
networks:
- monitoring
# Loki - Log toplama (opsiyonel)
loki:
image: grafana/loki:latest
container_name: loki
restart: always
ports:
- "127.0.0.1:3100:3100"
volumes:
- ./loki/config.yml:/etc/loki/local-config.yaml:ro
- ./loki/data:/loki
command: -config.file=/etc/loki/local-config.yaml
networks:
- monitoring
# Promtail - Log shipper (opsiyonel)
promtail:
image: grafana/promtail:latest
container_name: promtail
restart: always
volumes:
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./loki/promtail-config.yml:/etc/promtail/config.yml:ro
command: -config.file=/etc/promtail/config.yml
networks:
- monitoring
networks:
monitoring:
driver: bridge
⚙️ Prometheus Konfigürasyonu
prometheus/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: "production"
region: "eu-west-1"
# Alertmanager bağlantısı
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
# Alert kuralları
rule_files:
- "alerts.yml"
# Scrape hedefleri
scrape_configs:
# Prometheus kendisi
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
# Node Exporter (Sistem metrikleri)
- job_name: "node-exporter"
static_configs:
- targets: ["node-exporter:9100"]
labels:
instance: "production-server"
# cAdvisor (Container metrikleri)
- job_name: "cadvisor"
static_configs:
- targets: ["cadvisor:8080"]
# Docker container'ları otomatik keşfet
- job_name: "docker-containers"
docker_sd_configs:
- host: unix:///var/run/docker.sock
relabel_configs:
- source_labels: [__meta_docker_container_name]
target_label: container
# Postgres Exporter (eğer kuruluysa)
- job_name: "postgres"
static_configs:
- targets: ["postgres-exporter:9187"]
# Redis Exporter (eğer kuruluysa)
- job_name: "redis"
static_configs:
- targets: ["redis-exporter:9121"]
# Nginx Exporter (eğer kuruluysa)
- job_name: "nginx"
static_configs:
- targets: ["nginx-exporter:9113"]
# Blackbox Exporter (HTTP/HTTPS health check)
- job_name: "blackbox"
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
- https://api.example.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
🚨 Alert Kuralları
prometheus/alerts.yml:
groups:
- name: system_alerts
interval: 30s
rules:
# Yüksek CPU kullanımı
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% (current: {{ $value }}%)"
# Yüksek Memory kullanımı
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is above 85% (current: {{ $value }}%)"
# Disk doluluk
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Disk space is below 15% (current: {{ $value }}%)"
# Servis down
- alert: ServiceDown
expr: up == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.job }} is down"
description: "{{ $labels.instance }} has been down for more than 2 minutes"
# Container restart
- alert: ContainerRestarting
expr: rate(container_last_seen{name!=""}[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} is restarting"
description: "Container has restarted {{ $value }} times in the last 5 minutes"
- name: application_alerts
interval: 30s
rules:
# HTTP 5xx hatalar
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate on {{ $labels.instance }}"
description: "Error rate is {{ $value }} req/s"
# Yavaş response time
- alert: SlowResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Slow response time on {{ $labels.instance }}"
description: "95th percentile is {{ $value }}s"
📧 Alertmanager Konfigürasyonu
alertmanager/config.yml:
global:
resolve_timeout: 5m
smtp_smarthost: "smtp.gmail.com:587"
smtp_from: "alerts@example.com"
smtp_auth_username: "alerts@example.com"
smtp_auth_password: "your-app-password"
# Alarm yönlendirme
route:
group_by: ["alertname", "cluster", "service"]
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: "default"
routes:
# Critical alarmlar hemen gönder
- match:
severity: critical
receiver: "critical-alerts"
continue: true
# Warning alarmlar 5 dakika bekle
- match:
severity: warning
receiver: "warning-alerts"
group_wait: 5m
# Alıcılar
receivers:
- name: "default"
email_configs:
- to: "team@example.com"
headers:
Subject: "[MONITORING] {{ .GroupLabels.alertname }}"
- name: "critical-alerts"
email_configs:
- to: "oncall@example.com"
headers:
Subject: "🚨 [CRITICAL] {{ .GroupLabels.alertname }}"
# Slack entegrasyonu
slack_configs:
- api_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
channel: "#alerts"
title: "🚨 Critical Alert"
text: "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}"
- name: "warning-alerts"
email_configs:
- to: "team@example.com"
headers:
Subject: "⚠️ [WARNING] {{ .GroupLabels.alertname }}"
# Alarm susturma
inhibit_rules:
- source_match:
severity: "critical"
target_match:
severity: "warning"
equal: ["alertname", "instance"]
📊 Grafana Datasource Provisioning
grafana/provisioning/datasources/prometheus.yml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
- name: Loki
type: loki
access: proxy
url: http://loki:3100
editable: false
🚀 Başlatma
# Klasörleri oluştur
mkdir -p prometheus/data grafana/data alertmanager/data loki/data
# İzinleri ayarla
sudo chown -R 65534:65534 prometheus/data
sudo chown -R 472:472 grafana/data
# Stack'i başlat
docker compose up -d
# Logları kontrol et
docker compose logs -f
🔗 Erişim URL'leri
| Servis | URL | Varsayılan Giriş |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin123 |
| Prometheus | http://localhost:9090 | - |
| Alertmanager | http://localhost:9093 | - |
| Node Exporter | http://localhost:9100/metrics | - |
| cAdvisor | http://localhost:8080 | - |
📈 Önerilen Grafana Dashboard'ları
# Grafana'ya giriş yaptıktan sonra:
# 1. Dashboards → Import
# 2. Aşağıdaki ID'leri kullan:
# Node Exporter Full
1860
# Docker Container & Host Metrics
10619
# Prometheus 2.0 Stats
3662
# Loki Dashboard
13639
🔧 Ek Exporter'lar
Postgres Exporter
postgres-exporter:
image: prometheuscommunity/postgres-exporter
container_name: postgres-exporter
restart: always
environment:
DATA_SOURCE_NAME: "postgresql://user:password@postgres:5432/dbname?sslmode=disable"
ports:
- "127.0.0.1:9187:9187"
networks:
- monitoring
Redis Exporter
redis-exporter:
image: oliver006/redis_exporter
container_name: redis-exporter
restart: always
environment:
REDIS_ADDR: "redis:6379"
REDIS_PASSWORD: "your-redis-password"
ports:
- "127.0.0.1:9121:9121"
networks:
- monitoring
Nginx Exporter
nginx-exporter:
image: nginx/nginx-prometheus-exporter
container_name: nginx-exporter
restart: always
command:
- "-nginx.scrape-uri=http://nginx:8080/stub_status"
ports:
- "127.0.0.1:9113:9113"
networks:
- monitoring
🛡️ Güvenlik Önerileri
[!WARNING] > Production Ortamı İçin:
- Grafana şifresini değiştirin (
GF_SECURITY_ADMIN_PASSWORD)- Port'ları localhost'a bağlayın (zaten yapılmış)
- Nginx Reverse Proxy kullanın (SSL ile)
- Alertmanager SMTP şifresini
.envdosyasına taşıyın- Retention süresini ihtiyacınıza göre ayarlayın (varsayılan 30 gün)