โ† Back to Cloud & DevOps

Infrastructure Monitoring

Real-time metrics, alerting and observability with Grafana-style dashboards

https://grafana.pyco.cloud/d/overview
๐Ÿ“ˆ
๐Ÿ””
๐Ÿ–ฅ๏ธ
โš™๏ธ
Infrastructure Overview
Last 6 hours
CPU Usage
67%
โ†“ 3% from avg
Memory
82%
โ†‘ 5% from avg
Network I/O
1.2 GB/s
โ†‘ 12% from avg
Requests/sec
4,521
โ†‘ 8% from avg
System Resources
CPUMemoryNetwork
100%
75%
50%
25%
API Gateway
99.99% uptime
Avg latency: 45ms
Database
99.95% uptime
Avg latency: 12ms
Cache Layer
98.5% uptime
Avg latency: 8ms
๐Ÿ“ˆ
๐Ÿ””
๐Ÿ–ฅ๏ธ
โš™๏ธ
Active Alerts
Alert Rules3 firing
๐Ÿ”ด
High Memory Usage - cache-01
Memory usage exceeded 90% threshold
5m ago
๐ŸŸก
Elevated Error Rate - api-gateway
Error rate above 1% for 10 minutes
12m ago
๐ŸŸก
High Latency - database-primary
P99 latency exceeded 500ms
18m ago
๐ŸŸข
CPU Usage Normalized
CPU usage returned below 80%
1h ago
๐ŸŸข
Disk Space Cleared
Disk usage returned below 70%
2h ago
๐Ÿ“ˆ
๐Ÿ””
๐Ÿ–ฅ๏ธ
โš™๏ธ
Service Health
API Gateway
99.99% uptime
Latency: 45ms ยท RPS: 2,450
Auth Service
99.98% uptime
Latency: 28ms ยท RPS: 890
Database Primary
99.95% uptime
Latency: 12ms ยท QPS: 5,200
Cache Layer
98.5% uptime
Latency: 8ms ยท Hit rate: 94%
Message Queue
99.97% uptime
Latency: 5ms ยท Throughput: 12K/s
Search Service
99.92% uptime
Latency: 85ms ยท RPS: 320
๐Ÿ“ˆ
๐Ÿ””
๐Ÿ–ฅ๏ธ
โš™๏ธ
Settings
Data Sources
PrometheusLokiJaeger
Notification Channels
Slack #alerts
PagerDuty
Email Team
Retention
Metrics
30 days
Logs
14 days
Traces
7 days
Overview
CPU
67%
Memory
82%
Network
1.2 GB
Requests
4,521
Alerts
High Memory - cache-01
Error Rate - api-gateway
High Latency - database
CPU Normalized
Services
API Gateway
99.99% ยท 45ms
Database
99.95% ยท 12ms
Cache Layer
98.5% ยท 8ms
Message Queue
99.97% ยท 5ms
Settings
Data Sources
PrometheusLoki
Notifications
Slack, PagerDuty, Email
Screen 1 of 4
Overview