AI Engineer at HypeOn AI · Building in public

UmarfarookGurramkonda

I build production LLM systems. Multi-stage agents, retrieval pipelines, natural-language-to-SQL over warehouses, and the eval harnesses that keep them honest.

Bangalore, India/Remote & SF/Shipping 5 OSS projects

what i actually do

Five lanes, one focus

LLM Orchestration

Multi-stage agent flows with routing, intent, retrieval, composition. Streaming responses, structured outputs, fallback chains across Claude, Gemini, OpenAI.

Retrieval & RAG

Hybrid retrieval (BM25 + vector + reranker), chunking strategies, metadata filtering, multimodal RAG. Tuning that survives real document corpora.

Natural Language to SQL

Schema discovery, synonym matching, cost-capped query generation over BigQuery and Postgres. With a real eval harness, not vibes.

Voice & Realtime Agents

Full-duplex voice agents on top of streaming speech models. Latency engineering, interruption handling, turn-taking that feels human.

Evals & Observability

Eval harnesses, regression suites, cost dashboards. The unsexy work that separates a prototype from a system you can defend.

now shipping

Building in public

Five OSS projects across the lanes I care about. Each ships with evals, a public URL, and a write-up on the tradeoffs. Live status below.

01

BigQuery NL2SQL MCP Server

Shipping Soon

Query BigQuery in natural language from Claude Desktop, Cursor, and Claude Code. Schema discovery, cost caps, query explanation, safety guardrails.

MCP is an underserved lane. NL2SQL over warehouses is something I ship at work. First project.

PythonFastMCPBigQueryClaudePydantic
Shipping in 2 weeks
02

NL2SQL Eval Framework + Public Leaderboard

Shipping Soon

Open benchmark of Claude, GPT, Gemini, Llama on real BigQuery-style schemas. Live leaderboard updated as new models drop.

Evals are the most underserved skill in production AI. A live leaderboard is also a content engine.

PythonDuckDBNext.jsVercelSpider
Shipping in 4-5 weeks
03

Voice Mock Interview Coach

Shipping Soon

Real-time full-duplex voice agent that runs mock AI engineer interviews and gives feedback. Latency-tuned for conversational feel.

Voice is visually impressive and rare. Solves a real pain (mine, and every job seeker's).

OpenAI RealtimeWebRTCNext.jsFastAPI
Shipping in 7-8 weeks
04

Personal AI Research Assistant

Shipping Soon

Ingest arxiv, blogs, PDFs into a personal RAG. Weekly digest, semantic search across your library, local-first option via Ollama.

I need it. Tools you actually use end up well-built.

PythonPostgrespgvectorOllamaFastAPI
Shipping in 10-11 weeks
05

prod-llm-starter

Shipping Soon

Opinionated production template for LLM apps. FastAPI + LangGraph + Pydantic + Postgres/pgvector + eval harness + cost dashboard + auth + GHA. The thing every AI engineer wishes existed on day one.

Flagship. Utility repos compound. Forces deep understanding of every choice.

FastAPILangGraphpgvectorPrometheusDockerGHA
Shipping in 12-13 weeks

the track record

Where I've been

Oct 2025 to Present

AI Engineer

HypeOn AI

Building production LLM systems for D2C trend prediction. Multi-stage orchestration with routing/retrieval/composition, NL-to-SQL over BigQuery with cost guardrails, RAG pipelines with sentence-transformers, deployment on GCP Cloud Run with full observability.

LangChainFastAPIBigQueryCloud RunClaudeGemini
Oct 2024 to Sep 2025

Freelance ML / AI Engineer

Independent

Built an AI-powered inventory system for a retail client. LLM-based invoice extraction, demand forecasting with scikit-learn, real-time stock alerts, and a visualization dashboard for surfaced insights.

PythonPandasScikit-learnOpenAISQL
Jun to Sep 2024

Backend Developer Intern

Synclovis Systems

Built REST backend for an event-management web app. Also contributed to an internal LLM-based healthcare assistant, integrating RAG retrieval over clinical documents and adding guardrails.

Node.jsExpressMySQLLangChainFAISS
2020 to 2024

B.Tech, Computer Science

K.S.R.M College of Engineering, JNTU Anantapur

Graduated with CGPA 8.14 / 10.

how i think

Engineering principles

Coding is the easy part. Building the right system for a problem that keeps shifting is where the work actually lives.

Tradeoffs over tools

Pick by constraint, not hype. Postgres + pgvector beats a managed vector DB until it doesn't. Knowing when each breaks is the actual skill.

Evals before scale

If you cannot measure it, you cannot improve it. A bad eval beats no eval. A good eval beats opinions in standups.

Data quality over model swapping

A new model rarely fixes bad inputs. Time spent on retrieval quality, prompt structure, and labeled failures pays compounding interest.

Infrastructure is the product

Latency, cost, and reliability are features users feel. The model is one component of a system that has to stay up.

Ship narrow, then expand

One user, one workflow, working end-to-end. A tiny system that ships beats a grand system that demos.

AI-pair-programming with judgment

I use Claude Code, Cursor, and copilots aggressively. Then I reason through every architectural choice myself. Tools speed up typing; judgment doesn't delegate.

the toolkit

What I work with

Tools I use day to day. Not a list of every framework I've heard of.

LLMs & GenAI

Claude (Sonnet, Haiku)GeminiOpenAILangChainLangGraphPydanticsentence-transformersFAISS

Backend

PythonFastAPISQLAlchemyAlembicCeleryRESTSSE

Data

BigQueryPostgreSQLpgvectorRedisPandasNumPy

Cloud & Infra

GCPCloud RunCloud SQLGCSMemorystoreAWS (S3, EC2)Oracle Cloud

DevOps & Observability

DockerGitHub ActionsPrometheusSentryGit

Frontend

TypeScriptNext.jsReactTailwindFramer Motion

say hello

Let's build something.

I'm open to roles in production AI, especially teams shipping LLM systems, RAG, agents, or NL interfaces over data. Remote or San Francisco. Quick replies.