Job Description
Roles & Responsibilities
Responsibilities:
Lead and mentor a team of highly talented junior ML engineers through:
- Code reviews, design reviews, and technical direction
- Enforcement of strong software engineering and ML best practices
Design, deploy, and operate scalable AI systems with a focus on reliability and performance
Lead production deployment of LLMs and multimodal systems (RAG, OCR, voice)
Own model performance end-to-end, combining evaluation, observability, and hardware optimization:
- Build evaluation pipelines (benchmarks, regression testing, LLM-as-judge)
- Implement deep observability (tracing, latency, error tracking)
- Optimize GPU utilization (multi-GPU serving, batching, quantization, memory tuning)
- Continuously improve throughput, latency, and cost efficiency
Architect and manage GPU infrastructure:
- Model serving, load balancing, and scaling strategies
- Hardware-aware deployment and performance tuning
Build and maintain robust MLOps pipelines:
- Model/version management, CI/CD, automated testing, and rollback strategies
- Monitoring and feedback loops for continuous improvement
Engage directly with clients and stakeholders to:
- Gather and clarify business requirements
- Translate non-technical needs into well-defined technical problems
- Communicate solutions, trade-offs, and progress through clear documentation, reports, and proposals
Contribute hands-on to system design, implementation, debugging, and production incident resolution