Zach Vaillancourt

case studies

2023

A scalable AI infrastructure that powers context-aware agents and integrates seamlessly with third-party apps via API.

•Designing a multi-tenant AI architecture with secure API access
•Reducing latency in AI responses while maintaining long-term memory
•Supporting dynamic vector search with scoped namespaces per customer
•Managing efficient context retrieval from Redis, MongoDB, and Pinecone
•Handling low-cost, real-time message caching and token-based auth for external API calls

•Built a production-ready AI service from scratch in under a week
•Deployed scalable backend on AWS EC2 with NGINX, Gunicorn, and Docker
•Integrated Auth0 and custom API key management for secure multi-tenant access
•Implemented hybrid memory system using Redis (short-term) and Pinecone (long-term)
•Created a flexible queue system for async message embedding and upsert
•Added observability using Prometheus, Grafana, and Uptime Kuma
•Developed single endpoint for user chat input that handles full memory lifecycle
•Set up API Gateway and CloudFront for scalable public access

Next.jsTypeScriptTailwind CSSFlaskRedisMongoDBPineconeAuth0DockerAWSNGINXOpenAI API