Auth-svc
Migration Plan
Add gRPC to existing NestJS auth-svc, deploy KrakenD gateway for immediate 10–18ms latency reduction per request, migrate all 13 services, then rewrite in Go.
Playwright E2E
Build a comprehensive test suite validating all auth-svc behavior through its public interfaces. Safety net for every subsequent phase and acceptance gate for the Go rewrite.
1.1Environment Setup
- Playwright test project in auth-svc-e2e/
- Docker Compose: auth-svc, PostgreSQL, RabbitMQ, Valkey in isolation
- Seed scripts with deterministic data: users, roles, permissions, tokens, secret keys
- Pinned environment variables for reproducibility
1.2Authentication Flow Tests
- Login — valid creds, wrong password, nonexistent user, disabled user
- Logout — token invalidation verified on subsequent requests
- Refresh — valid token, expired token, already-used token
1.3Token Validation Tests
- Valid Bearer → 200 + user with roles/permissions
- Expired, revoked, malformed, wrong-secret, tampered tokens → 401
1.4–1.6CRUD & Management Tests
- User CRUD — paginated list, nested roles, bcrypt hash verification
- Role & permission CRUD — hierarchy, flags (view/create/update/delete)
- Secret key CRUD — x-secret validation, rotation
1.7Password Reset & Invitation Tests
- Reset flow — token creation, completion, expiry handling
- Invitation flow — hash generation, acceptance, expiry
1.8RabbitMQ Contract Tests
- auth pattern — valid/expired/revoked/malformed tokens, snapshot response schema
- auth.secret pattern — valid/invalid keys, snapshot response schema
- Contract snapshots become the formal spec for all future changes
1.9Edge Cases & Performance Baseline
- 100 parallel auth pattern messages — all return correct results
- SQL injection, oversized inputs, unicode edge cases
- Record baseline latency, memory, CPU at idle and under 1000 req/s load
1.10CI Integration
- Run on every auth-svc PR
- All tests green before proceeding
gRPC on NestJS
Add gRPC listener to existing NestJS auth-svc alongside REST and RabbitMQ. No consumers change. No rewrite.
2.1Proto Definition
- ValidateToken (unary) — JWT → user object with roles/permissions
- ValidateSecret (unary) — secret key → validation result
- GetUserPermissions (unary) — user ID → full permission set
- Response messages match RabbitMQ contract snapshots exactly
- Store proto in shared location (repo, package, or buf registry)
2.2NestJS gRPC Server
- @nestjs/microservices gRPC transport — second transport alongside RMQ
- Separate port (e.g., 50051), same service layer as REST/RMQ controllers
- Prometheus metrics for gRPC (count, latency, errors)
- gRPC health check endpoint
2.3Testing
- gRPC integration tests verifying parity with RMQ contract snapshots
- Load test gRPC endpoint for baselines
- Full Playwright E2E re-run — no regression on REST + RMQ
2.4Proto Distribution
- Generate TypeScript types and gRPC client stubs from proto
- Publish as @nsix/auth-proto or bundle with auth-guard package
2.5Deploy
- Staging: gRPC port exposed, verify reachability
- Production: gRPC listener idle — no consumers yet
Package + KrakenD
Build the shared package with two strategies (RabbitMQ + gateway headers) and configure KrakenD simultaneously. Each migrated service immediately gets the full latency win.
3.1Package Scaffolding
- @nsix/auth-guard npm package in shared repo
- TypeScript build, lint, publish pipeline
- Public API: modules, guards, decorators, interfaces
3.2Two-Strategy Design
- RabbitMqStrategy — current behavior, backward compatible default
- GatewayHeaderStrategy — trusts x-user-id, x-user-roles, x-user-permissions from KrakenD
- Skip gRPC strategy — leapfrogging straight to gateway
- If gateway headers missing → reject (fail closed)
3.3NestJS Module & Guards
- NsixAuthModule.forRoot(options) / forRootAsync(options)
- NsixAuthGuard — CanActivate, delegates to active strategy
- NsixSecretGuard — x-secret validation
- Decorators: @CurrentUser(), @Roles(), @Permissions(), @Public()
3.4Token Caching Layer
- Optional Valkey/Redis cache for validated tokens (RMQ strategy only)
- Configurable TTL (default 60s), invalidation on logout/role-change
- Opt-in via configuration
3.5Package Testing
- Unit tests per strategy, integration tests with real RMQ
- Gateway header safety check tests (missing headers → reject)
- Cache hit/miss/invalidation/TTL tests
- All guard and decorator combinations
3.6Documentation
- Installation: npm install @nsix/auth-guard
- Peer deps: @nestjs/core >=10, @nestjs/microservices >=10, ioredis (optional)
- Per-service migration checklist: files to add, files to remove, env vars, verification
3.7KrakenD Configuration (parallel)
- Routes for all 13 backend services with path-based routing
- JWT validation plugin with same JWT_SECRET
- Header injection: x-user-id, x-user-roles, x-user-permissions, x-correlation-id
- Rate limiting tiers (anon 20/min, auth 100/min, premium 500/min)
- Per-route timeouts, CORS at gateway level
3.8Publish v1.0.0
- Internal npm registry, RabbitMQ as default strategy
- Announce with migration guide
Service Migration
Roll out package + gateway in lockstep. Each service migrated to gateway header strategy gets the full 10–18ms latency drop immediately.
4.1Staging Validation
- Deploy KrakenD to staging
- Run Playwright E2E suite through gateway
- Verify routes, headers, rate limiting
- Load test at peak (100K req/hr)
- telephony-svc
- storage-svc
- integration-svc
- eventstore-svc
- events-svc
- notify-svc
- pricesync-svc
- inventory-svc
- crm-svc
- commerce-svc
- catalog-svc
- client-svc
- shop-svc
4.2–4.4Per-Service Migration Steps
- Install @nsix/auth-guard
- Add NsixAuthModule.forRoot({ strategy: 'gateway' }) to AppModule
- Replace existing guard with NsixAuthGuard
- Remove old auth RMQ ClientProxy, old guard files, old middleware
- Run service tests → staging → production
4.5RabbitMQ Auth Cleanup
- Confirm auth_queue message rate at zero
- Remove auth/auth.secret RMQ consumers from auth-svc
- Remove auth queue from RabbitMQ
- Release @nsix/auth-guard v2.0.0 dropping RabbitMQ strategy
- Auth-svc is now REST + gRPC only
Go Rewrite
Rewrite auth-svc in Go. All consumers decoupled via gateway — the rewrite is invisible to them. Auth-svc now handles only ~2K–4K req/hr (login/logout/refresh).
5.1Project Setup
- Go module: cmd/, internal/, pkg/, api/, migrations/
- Libraries: pgx, golang-jwt, bcrypt, grpc-go, prometheus
- Multi-stage Dockerfile (builder → distroless)
5.2Database Layer
- Port Prisma schema to Go migrations (golang-migrate or goose)
- Repository layer with pgx + pgxpool connection pooling
- All models: User, Role, RoleUser, PermissionModule, PermissionRole, Token, SecretKey, Position, PasswordReset, UserInvite
5.3Auth + REST + gRPC
- JWT with golang-jwt/jwt/v5 — must accept Node.js bcrypt hashes
- All REST endpoints matching NestJS surface (paths, shapes, status codes)
- gRPC server from same proto files as Phase 2
- Go Unleash client, port SecretKeyMiddleware
5.4Acceptance & Deploy
- Full Playwright E2E against Go auth-svc — zero test modifications
- gRPC contract tests pass
- Performance comparison vs Phase 1 baselines
- Blue/green production deploy (NestJS → Go image swap)
- Rollback ready (revert image tag), monitor 1 week