N-six Platform

Auth-svc
Migration Plan

Add gRPC to existing NestJS auth-svc, deploy KrakenD gateway for immediate 10–18ms latency reduction per request, migrate all 13 services, then rewrite in Go.

Auth latency
10–20ms → <2ms
After gateway
RMQ auth msgs
110K/hr → 0
Eliminated
Auth memory
1.5GB → 30MB
After Go rewrite
Two deliverables
Auth-svc evolution
Add gRPC to existing NestJS → deploy gateway → rewrite in Go when nearly idle
@nsix/auth-guard package
Shared NestJS package replacing RabbitMQ auth across all 13 consuming services
🧪
Phase 1

Playwright E2E

Build a comprehensive test suite validating all auth-svc behavior through its public interfaces. Safety net for every subsequent phase and acceptance gate for the Go rewrite.

1.1Environment Setup

  • Playwright test project in auth-svc-e2e/
  • Docker Compose: auth-svc, PostgreSQL, RabbitMQ, Valkey in isolation
  • Seed scripts with deterministic data: users, roles, permissions, tokens, secret keys
  • Pinned environment variables for reproducibility

1.2Authentication Flow Tests

  • Login — valid creds, wrong password, nonexistent user, disabled user
  • Logout — token invalidation verified on subsequent requests
  • Refresh — valid token, expired token, already-used token

1.3Token Validation Tests

  • Valid Bearer → 200 + user with roles/permissions
  • Expired, revoked, malformed, wrong-secret, tampered tokens → 401

1.4–1.6CRUD & Management Tests

  • User CRUD — paginated list, nested roles, bcrypt hash verification
  • Role & permission CRUD — hierarchy, flags (view/create/update/delete)
  • Secret key CRUD — x-secret validation, rotation

1.7Password Reset & Invitation Tests

  • Reset flow — token creation, completion, expiry handling
  • Invitation flow — hash generation, acceptance, expiry

1.8RabbitMQ Contract Tests

  • auth pattern — valid/expired/revoked/malformed tokens, snapshot response schema
  • auth.secret pattern — valid/invalid keys, snapshot response schema
  • Contract snapshots become the formal spec for all future changes

1.9Edge Cases & Performance Baseline

  • 100 parallel auth pattern messages — all return correct results
  • SQL injection, oversized inputs, unicode edge cases
  • Record baseline latency, memory, CPU at idle and under 1000 req/s load

1.10CI Integration

  • Run on every auth-svc PR
  • All tests green before proceeding
Phase 2

gRPC on NestJS

Add gRPC listener to existing NestJS auth-svc alongside REST and RabbitMQ. No consumers change. No rewrite.

2.1Proto Definition

  • ValidateToken (unary) — JWT → user object with roles/permissions
  • ValidateSecret (unary) — secret key → validation result
  • GetUserPermissions (unary) — user ID → full permission set
  • Response messages match RabbitMQ contract snapshots exactly
  • Store proto in shared location (repo, package, or buf registry)

2.2NestJS gRPC Server

  • @nestjs/microservices gRPC transport — second transport alongside RMQ
  • Separate port (e.g., 50051), same service layer as REST/RMQ controllers
  • Prometheus metrics for gRPC (count, latency, errors)
  • gRPC health check endpoint

2.3Testing

  • gRPC integration tests verifying parity with RMQ contract snapshots
  • Load test gRPC endpoint for baselines
  • Full Playwright E2E re-run — no regression on REST + RMQ

2.4Proto Distribution

  • Generate TypeScript types and gRPC client stubs from proto
  • Publish as @nsix/auth-proto or bundle with auth-guard package

2.5Deploy

  • Staging: gRPC port exposed, verify reachability
  • Production: gRPC listener idle — no consumers yet
📦
Phase 3

Package + KrakenD

Build the shared package with two strategies (RabbitMQ + gateway headers) and configure KrakenD simultaneously. Each migrated service immediately gets the full latency win.

3.1Package Scaffolding

  • @nsix/auth-guard npm package in shared repo
  • TypeScript build, lint, publish pipeline
  • Public API: modules, guards, decorators, interfaces

3.2Two-Strategy Design

  • RabbitMqStrategy — current behavior, backward compatible default
  • GatewayHeaderStrategy — trusts x-user-id, x-user-roles, x-user-permissions from KrakenD
  • Skip gRPC strategy — leapfrogging straight to gateway
  • If gateway headers missing → reject (fail closed)

3.3NestJS Module & Guards

  • NsixAuthModule.forRoot(options) / forRootAsync(options)
  • NsixAuthGuard — CanActivate, delegates to active strategy
  • NsixSecretGuard — x-secret validation
  • Decorators: @CurrentUser(), @Roles(), @Permissions(), @Public()

3.4Token Caching Layer

  • Optional Valkey/Redis cache for validated tokens (RMQ strategy only)
  • Configurable TTL (default 60s), invalidation on logout/role-change
  • Opt-in via configuration

3.5Package Testing

  • Unit tests per strategy, integration tests with real RMQ
  • Gateway header safety check tests (missing headers → reject)
  • Cache hit/miss/invalidation/TTL tests
  • All guard and decorator combinations

3.6Documentation

  • Installation: npm install @nsix/auth-guard
  • Peer deps: @nestjs/core >=10, @nestjs/microservices >=10, ioredis (optional)
  • Per-service migration checklist: files to add, files to remove, env vars, verification

3.7KrakenD Configuration (parallel)

  • Routes for all 13 backend services with path-based routing
  • JWT validation plugin with same JWT_SECRET
  • Header injection: x-user-id, x-user-roles, x-user-permissions, x-correlation-id
  • Rate limiting tiers (anon 20/min, auth 100/min, premium 500/min)
  • Per-route timeouts, CORS at gateway level

3.8Publish v1.0.0

  • Internal npm registry, RabbitMQ as default strategy
  • Announce with migration guide
🔄
Phase 4

Service Migration

Roll out package + gateway in lockstep. Each service migrated to gateway header strategy gets the full 10–18ms latency drop immediately.

4.1Staging Validation

  • Deploy KrakenD to staging
  • Run Playwright E2E suite through gateway
  • Verify routes, headers, rate limiting
  • Load test at peak (100K req/hr)
WAVE 1 — LOW RISK
  • telephony-svc
  • storage-svc
  • integration-svc
  • eventstore-svc
WAVE 2 — MEDIUM RISK
  • events-svc
  • notify-svc
  • pricesync-svc
  • inventory-svc
WAVE 3 — HIGH TRAFFIC
  • crm-svc
  • commerce-svc
  • catalog-svc
  • client-svc
  • shop-svc

4.2–4.4Per-Service Migration Steps

  • Install @nsix/auth-guard
  • Add NsixAuthModule.forRoot({ strategy: 'gateway' }) to AppModule
  • Replace existing guard with NsixAuthGuard
  • Remove old auth RMQ ClientProxy, old guard files, old middleware
  • Run service tests → staging → production

4.5RabbitMQ Auth Cleanup

  • Confirm auth_queue message rate at zero
  • Remove auth/auth.secret RMQ consumers from auth-svc
  • Remove auth queue from RabbitMQ
  • Release @nsix/auth-guard v2.0.0 dropping RabbitMQ strategy
  • Auth-svc is now REST + gRPC only
🦫
Phase 5

Go Rewrite

Rewrite auth-svc in Go. All consumers decoupled via gateway — the rewrite is invisible to them. Auth-svc now handles only ~2K–4K req/hr (login/logout/refresh).

Low-risk: E2E validates REST, proto defines gRPC, no RMQ consumers remain. Deployment = image swap.

5.1Project Setup

  • Go module: cmd/, internal/, pkg/, api/, migrations/
  • Libraries: pgx, golang-jwt, bcrypt, grpc-go, prometheus
  • Multi-stage Dockerfile (builder → distroless)

5.2Database Layer

  • Port Prisma schema to Go migrations (golang-migrate or goose)
  • Repository layer with pgx + pgxpool connection pooling
  • All models: User, Role, RoleUser, PermissionModule, PermissionRole, Token, SecretKey, Position, PasswordReset, UserInvite

5.3Auth + REST + gRPC

  • JWT with golang-jwt/jwt/v5 — must accept Node.js bcrypt hashes
  • All REST endpoints matching NestJS surface (paths, shapes, status codes)
  • gRPC server from same proto files as Phase 2
  • Go Unleash client, port SecretKeyMiddleware

5.4Acceptance & Deploy

  • Full Playwright E2E against Go auth-svc — zero test modifications
  • gRPC contract tests pass
  • Performance comparison vs Phase 1 baselines
  • Blue/green production deploy (NestJS → Go image swap)
  • Rollback ready (revert image tag), monitor 1 week
Expected gains
Memory per replica~90% reduction
256–512 MB
20–40 MB
CPU per replica~95% reduction
0.5–1.0 vCPU
0.01–0.03 vCPU
Cold start~50x faster
3–5 sec
50–100 ms
Container image~95% smaller
300–500 MB
15–25 MB

Risks & Success Criteria

Risk Mitigation
RISKMITIGATION
gRPC destabilizes NestJS auth-svc
E2E suite catches regressions; separate transport, doesn't touch existing paths
Service migration breaks a consumer
Wave rollout; RabbitMQ fallback; rollback = revert npm version
Gateway misconfigures auth headers
Gateway header strategy fails closed — missing headers → reject
Go rewrite has behavior differences
E2E + gRPC tests validate full contract; consumers already decoupled
Stale cached permissions
Short TTL (60s) + active invalidation on logout and role changes
Success Criteria
All Playwright tests pass at every phase transition
Auth latency: 10–20ms → under 2ms after gateway deployment
RabbitMQ auth messages: ~55K–110K/hr → zero after Phase 4
All 13 services migrated with no auth-related incidents
Go auth-svc passes full E2E with zero test modifications
Auth-svc memory: 512MB–1.5GB → under 30MB after Go rewrite