Question 1

Where do you deploy production workloads?

Accepted Answer

Primarily Cloudflare and AWS. We’re cloud-pragmatic: the right platform depends on the workload, compliance requirements and the client’s existing commitments. Our applications are built on standard modern web technologies so they travel cleanly between platforms.

Question 2

Do you provide managed databases?

Accepted Answer

Yes. PostgreSQL, MySQL and Redis are our most common managed offerings, provisioned via infrastructure-as-code and monitored alongside the application. Backups, point-in-time recovery and patching are included.

Question 3

Who responds when a production system breaks at 3am?

Accepted Answer

For systems we operate, we do. We run a documented on-call rotation with paging, escalation paths and written runbooks, with 30-minute Sev-1 response during business hours or 24/7 under a contracted rotation. For systems operated by your team, we help you stand up the same practices and train the people carrying the pager. (‘On-call’ is the standard engineering term for the person currently responsible for responding to production incidents.)

Question 4

What does ‘observability-first’ mean in practice?

Accepted Answer

Structured logs, metrics and traces wired from day one, with dashboards and alerts focused on the indicators your team would actually act on. If an alert fires and nobody does anything, we either fix the alert or delete it.

Question 5

How do you handle DDoS and abuse?

Accepted Answer

We lean on Cloudflare’s network for DDoS mitigation, WAF and bot protection in front of most of our production workloads. Application-layer controls (rate limiting, auth hardening, audit logging) are layered on top.

Question 6

What’s your uptime target?

Accepted Answer

We target 99.9% monthly uptime for production workloads with 30-minute Sev-1 response during business hours, or 24×7 with a contracted on-call rotation. See our SLA for full details.

Question 7

What are your RPO and RTO targets?

Accepted Answer

Defined per system in the engagement contract. Typical production targets are an RPO of 15 minutes or less and an RTO of 4 hours or less. Backups are automated, encrypted, cross-region where the workload calls for it, and we test restore procedures on a quarterly basis. No backup is considered valid until it has been successfully restored in a drill.

Question 8

How do you handle data residency requirements?

Accepted Answer

The primary region for every workload is chosen during discovery based on your regulatory and commercial requirements. Backups and replicas are kept in-region (or in an approved set of regions) by default. Data residency commitments and any cross-border transfer flows are documented in the engagement-level data flow, which becomes a deliverable you can show an auditor.

Question 9

Can you operate a system you didn’t build?

Accepted Answer

Yes. We start with a codebase and infrastructure review, then propose a stabilisation plan before we agree to any SLA commitments. We don’t take on production on-call responsibility for a system we haven’t had time to understand: that’s a fast path to an outage neither party wants.

Ship faster. Stay up. Sleep at night.

Faster, safer releases

Fewer surprise outages

An audit trail you can show a regulator

A discovery sprint, a written proposal, then we ship.

Discovery sprint

Scoped build

Operate or hand over

Platform & Reliability Engineering FAQ

Let's build something that ships.