Once the strategy is set (a two-pillar approach, an open-source stack), it still has to be executed. And that’s often where projects get bogged down: overreaching ambition, a big-bang rollout, no visible value before 12 months…
Our conviction: a successful transition to observability happens in 4 progressive phases, each delivering immediate value that justifies the next. No tunnel effect, no overpromising you can’t deliver on.
Phase 1 — Scoping and audit
- Inventory of what’s already in place: tools, agents, covered scopes, gaps. You can’t plan the future without understanding the present.
- Mapping needs by team: network admins, DevOps, developers, IT leadership, security. Each has different needs and formats.
- Defining objectives and success metrics: target MTTR, detection rate, managed volume, and so on.
- Tooling strategy decisions: keep/replace/add, hosting (on-premise vs. cloud), sovereignty.
Typical deliverables: a matrix of existing tools, a map of needs, and a target architecture document.
Phase 2 — Foundations
- Consolidating the infrastructure monitoring pillar (removing redundancies, updating tools, covering the blind spots you identified).
- Deploying the observability stack (VictoriaMetrics backend, Grafana, baseline configuration).
- Setting up the OpenTelemetry Collector as a pilot on a limited scope.
- First Grafana dashboards bringing the two pillars together.
By the end of this phase, the organization has two functioning pillars and an integration layer. That alone is a significant gain in visibility.
Phase 3 — Progressive instrumentation
Instrumentation rolls out in three waves, each delivering immediate value:
| Wave 1: Metrics | Wave 2: Logs | Wave 3: Traces |
|---|---|---|
| Effort: low Prometheus/OTLP endpoints Value: dashboards, alerting | Effort: moderate Structured format + trace ID Value: context, investigation | Effort: significant OTel SDK / auto-instrumentation Value: root cause |
This progression avoids the tunnel effect: as early as Wave 1, teams have relevant dashboards and alerts. Wave 2 enriches the context. Wave 3 enables root-cause analysis.
In parallel: setting up unified alerting, training the teams, and knowledge transfer on the new tools.
Phase 4 — Optimization and maturity
- Cross-layer correlation (from infrastructure to application): automatically linking a CPU alert to an application log and a user trace.
- Anomaly detection and proactive alerting: moving from reactive alerting (thresholds) to predictive detection.
- Cost optimization (retention, downsampling, cleaning up unused series).
- Continuous improvement of dashboards and user experience.
How long does it take?
It varies widely depending on the context: size of the IT estate, team maturity, application complexity. A few orders of magnitude observed in the field:
- Phase 1 (audit) — 4 to 8 weeks
- Phase 2 (foundations) — 2 to 4 months
- Phase 3 (instrumentation) — 4 to 9 months depending on the number of applications
- Phase 4 (optimization) — ongoing, with no end date
All in all, expect 12 to 18 months to reach genuine maturity. But the first tangible value appears as early as Phase 2, roughly 3–4 months after kickoff.
Pitfalls to avoid
- Trying to cover everything at once — start with 2–3 critical applications in Phase 3
- Underestimating change management — teams need to adopt the new tools, not have them forced on them
- Neglecting data volume — picking a backend that can’t handle the load dooms the project
- Forgetting hidden costs — agents, inter-cloud transfers, training, support
Successful observability isn’t a technical project: it’s a transformation project that touches architecture, organization, and operational culture.
This article is drawn from our white paper “From Monitoring to Observability” (PDF, 2026). To discuss your own situation, get in touch.