Production Hardening: What It Actually Takes
By Forge IT Systems Engineering
Every developer has shipped something that works perfectly on localhost and breaks the first time a real user touches it. A form that crashes on empty input. An API that returns a 500 when the database times out. An admin panel that goes blank when one widget fails to load.
Production hardening is the unglamorous work of closing that gap. Here's what it looked like for our own site.
API Endpoint Hardening
We have around a dozen API routes: public endpoints for contact form submissions and analytics tracking, plus authenticated admin endpoints for blog management, contact management, analytics, and settings.
Every endpoint got the same treatment:
1. Parse the Request Body Safely
The most commonly skipped validation. If someone sends Content-Type: application/json with a body of {broken, your API shouldn't return a 500. We wrap every req.json() call in a try/catch and return a 400 with a clear message on failure.
2. Validate Before Mutating
Every write operation validates the request body against a Zod schema before touching the database. Invalid data gets a 400 with field-level error messages. Valid data proceeds.
Zod schemas are shared between the client form validation and the API. One source of truth for what constitutes valid input.
3. Check Existence Before Mutation
Before updating or deleting a resource, we verify it exists. A DELETE request to a non-existent resource returns 404, not 500. This sounds obvious, but skipping this check means your database driver's error becomes your user's error message.
4. Catch Everything Else
Every endpoint body is wrapped in a top-level try/catch that returns 500 with a generic error message. The actual error is logged server-side but never exposed to the client. No stack traces in production responses.
5. Rate Limiting
Public endpoints include IP-based rate limiting. We use a simple in-memory counter with a sliding window — 100 requests per IP per minute. Not sophisticated enough for a distributed system, but effective for a single-server deployment.
This prevents the most obvious abuse: automated form spam and analytics injection. It's the minimum viable defence, and it takes about 20 lines of code.
Error Boundaries in React
Server-side errors in API routes are one thing. Client-side rendering crashes are another.
React error boundaries catch rendering failures and show a fallback UI instead of a blank screen. We added them at three levels:
Page-level error boundary (error.tsx): catches any unhandled error in the admin section. Shows a retry button and a link back to the dashboard. The user sees "Something went wrong" instead of a white screen.
Component-level try/catch in server components: the dashboard page renders stats and recent activity from the database. Each section is wrapped in its own try/catch. If the stats query fails, the activity feed still renders.
Network error handling in client components: the blog editor's save function, the contacts page's read-toggle — every fetch() call has error handling with user-facing feedback. "Network error. Check your connection" beats a silent failure.
The goal is never a blank screen. Something should always render, even if it's "Something went wrong. Try refreshing."
Loading States
A loading skeleton is better than a spinner. Skeletons communicate the shape of incoming content, reducing perceived load time. Our admin loading state mirrors the dashboard layout: a header bar, four stat card outlines, and two content card outlines. When the data arrives, it fills the same shapes — the visual transition is smooth.
Authentication and Middleware
The admin section is protected by NextAuth.js middleware. Every /admin/* route (except /admin/login) requires a valid session. Every /api/admin/* endpoint requires a valid session.
Unauthenticated requests to admin pages redirect to /admin/login with a callbackUrl parameter, so users land on their intended page after logging in. Unauthenticated API requests are intercepted by middleware before the route handler even runs.
This is a single point of enforcement. No per-route auth checks, no forgetting to protect a new endpoint. The middleware handles it.
Input Validation at the Boundary
We validate at system boundaries: where user input enters the system. The contact form validates client-side (for UX) and server-side (for security). The blog editor validates against the same Zod schema on both sides.
Internal function calls don't validate their arguments. If db.blogPost.create receives bad data, that's a bug in our code, not a user input problem. Validating everywhere adds noise and hides the actual trust boundary.
What We Don't Do
We don't add CSRF tokens manually. NextAuth.js handles this for authenticated routes.
We don't encrypt data at rest. The database stores blog posts and contact form submissions. Neither contains secrets. If your threat model requires encryption at rest, you probably need a different database layer entirely.
We don't implement retry logic. If a database write fails, we tell the user. We don't silently retry and hope it works. Retries are appropriate for idempotent reads in distributed systems. For a single-server web app, they add complexity without benefit.
We don't over-log. We log errors. We don't log successes, request bodies, or user actions. Excessive logging creates noise that makes real problems harder to find.
The Hardening Checklist
If you're hardening your own application, here's the minimum:
- Every
req.json()is in a try/catch returning 400 on failure - Every write validates input against a schema
- Every mutation checks resource existence returning 404 if missing
- Every route has a top-level try/catch returning 500 on unhandled errors
- Public endpoints have rate limiting
- React has error boundaries at page and section level
- Loading states exist for every async page
- Authentication middleware covers all protected routes
- Client-side fetch calls handle network errors
- No stack traces or internal details in production error responses
None of this is innovative. All of it is necessary.
Production hardening isn't about being clever. It's about being thorough. The best error handling is the kind users never see — because it caught the problem before it reached them.