Observability Tech Lead
Trade Republic
Please note that these positions are based in London, Berlin or Paris — relocation support is provided if required.
THE BEST WORK OF YOUR CAREER
Trade Republic is the largest savings platform in Europe - we operate in 18 countries, serving +10 million customers who trusted us with over €150B in assets. But we’re striving for more.
We have a bold mission to empower everyone to build wealth with easy, safe, and free access to financial systems. You will have the opportunity to grow your career by collaborating with a team of outstanding talents and state of the art technology to build a lasting, positive future for millions. ing talents and state of the art technology to build a lasting, positive future for millions.
ABOUT PLATFORM ENGINEERING
Platform Engineering is the backbone of Trade Republic's engineering velocity. Our mission is to build scalable platforms for a Europe-scale bank — serving internal engineers, and building in-house control planes for managing the bank's infrastructure. We’re a ~50-person Platform team focused on one thing: enabling product engineers to move fast and operate autonomously by default.
We build self-service platforms, golden paths, and opinionated tooling so that over 400 engineers can ship with confidence. From Kubernetes fleet management and CI/CD to an internal Developer Hub built on Backstage, our work underpins every trade, savings plan, and card payment that flows through the platform.
THE OBSERVABILITY JOURNEY
In 2023 we made a decisive move: we replaced our observability-as-a-service provider with a fully self-hosted observability stack, giving us complete control over cost, data residency, and the developer experience around telemetry. Today our stack spans the full LGTM suite — Grafana, Mimir, Loki, and Tempo — alongside VictoriaMetrics, self-hosted Sentry, Grafana Alloy as our telemetry collector, and OpenTelemetry as the instrumentation standard. We use Pyrra for SLO tracking and are building toward a unified service health dashboard powered by error budgets and burn-rate alerting.
Telemetry is the backbone of how we operate a bank at scale — ingesting over 100 million samples, serving 400+ services, and capturing end-to-end traces from clients through services to system dependencies. Every trade, every card payment depends on our ability to see, measure, and respond to what's happening in production. We've proven the architecture works. Now we're building a dedicated in-house observability team to take it to the next level: stabilise and harden the platform, drive down cost-per-signal, and build the golden path for observability — where 100% of components ship with production-grade telemetry because the best thing to do is the easiest thing to do.
WHAT YOU'LL BE DOING
- Build and evolve the observability platform: Design and operate large-scale telemetry pipelines while continuously improving core components with a strong focus on automation, reliability, and developer experience.
- Build for scale, design for cost: Architect high-throughput telemetry systems with sampling strategies, data tiering, and retention policies that balance signal fidelity with infrastructure cost at scale.
- Make production observable by default: Define and implement observability and reliability standards — SLOs, error budgets, and low-noise alerting — and actively support engineering teams in adopting them, making doing the right thing effortless.
- Own the platform end to end: Participate in the on-call rotation for the observability platform, ensuring full end-to-end ownership of the systems you build and operate.
- Own the direction and drive it forward: Define long-term observability direction, drive cross-team initiatives from kickoff to delivery, and align observability strategy with broader engineering reliability and business goals.
WHAT WE'RE LOOKING FOR
- 5+ years of experience in observability, platform engineering, or a related SRE/infrastructure discipline.
- We are hiring from senior to staff level, so whether you have a strong foundation and are ready for more ownership or you have been leading observability strategy for large-scale systems for many years, we would love to hear from you.
- Deep hands-on expertise with the observability stack — Prometheus, OpenTelemetry, Grafana, or equivalent at scale. Hands-on experience with Mimir, Loki, and Tempo architectures is a strong benefit.
- Proven ability to design and operate high-throughput telemetry pipelines across distributed, multi-cloud environments.
- Strong command of SLO-based reliability practices — error budgets, burn-rate alerting, and incident response tooling.
- Cloud-native in your DNA: hands-on with Kubernetes, Terraform, and running production workloads on AWS, GCP, or Azure.
- A track record of turning observability best practices into opinionated standards that engineering teams actually adopt.
- Experience driving cross-team technical initiatives end-to-end, from ambiguous problem to shipped solution.
- Ability to contribute to architectural decisions and clearly communicate trade-offs to both engineers and leadership.
- The ability to work in a flexible hybrid setup, with 2-3 days a week in the office.
WHY YOU SHOULD APPLY NOW
Our culture rewards ownership, excellence, and high energy. We care deeply about outcomes and hold each other accountable — we're here to win and fix one of the largest challenges Europeans face — closing the pension gap and democratising wealth. If this gets you fired up, reach out!
We believe it’s our team’s varied identities and backgrounds that make us sharper and stronger. We’re committed to creating an environment where everyone feels respected and has equal opportunity to thrive in their careers. For any questions on DEI during the interview process, reach out to your recruitment partner.