// currently building fine-tuning multimodal models at Snap _

Hi, I'm
Hephzibah.

I build AI systems at scale. From high-throughput ML pipelines and visual embeddings to agentic LLM workflows and distributed infrastructure. Passionate about systems that think, adapt, and perform.

scroll

About Me

The story behind the systems

My path in tech started with a Smart India Hackathon win during undergrad. That early taste of building something impactful set the tone for everything after.

I kicked off my career at Oracle in Hyderabad, architecting Java/Spring Boot apps for OCI and orchestrating infrastructure migrations that cut compute time by 45%. Then came my MSCS at Northeastern, where I dove deep into AI and distributed systems.

At Akamai I built LLM-powered agentic workflows that achieved 92% triage accuracy in incident response. Currently at Snap Inc, I'm working on visual model embeddings, PEFT fine-tuning of multimodal models, and BigQuery pipelines over 50M+ row datasets.

I love the intersection of scalable systems and applied AI, building things that think and perform at real scale.

LangChain Go · Python · TypeScript BigQuery · VectorDB Kubernetes · Terraform PEFT · Fine-tuning AWS · GCP · Azure Anthropic Claude Pinecone · vLLM

Technical Skills

Languages

Go · Java · Python · TypeScript · JavaScript · SQL · C/C++ · PHP · .NET · Kotlin · Bash · HTML/CSS

AI & LLM Engineering

LangChain · Anthropic Claude · OpenAI SDK · Agentic Workflows · RAG Pipelines · Multi-modal Models · PEFT Fine-tuning · Vector Embeddings · vLLM/Ollama · Pinecone

ML Evaluation & Tools

RAGAS · LangSmith · LLM-as-a-Judge · Weights & Biases · PyTorch · TensorFlow · GitHub Copilot · Cursor IDE

Backend & APIs

FastAPI · Spring Boot · Django · Flask · Node.js · Express · REST APIs · GraphQL · Microservices · Distributed Systems

Cloud & Infrastructure

AWS · GCP · Azure · OCI · Kubernetes · Docker · Terraform · GitOps · CI/CD · Jenkins

Data & Databases

BigQuery · PostgreSQL · MongoDB · Elasticsearch · Redis · Pandas/NumPy · Apache Spark · Hadoop · ETL

Frontend

React · Next.js · Angular · Vue.js · Tailwind CSS · Redux · Oracle JET

Observability & Quality

Prometheus · Grafana · New Relic · Splunk · SLO/SLI · JUnit/Mockito · Integration Testing · Unit Testing

Experience

Where I've built and learned

Machine Learning Engineer @ Snap Inc
FEB 2026 to PRESENT · Palo Alto, CA
  • Architecting a high-throughput metadata processing pipeline using BigQuery + SQL over 50M+ row datasets, improving efficiency by 25% and ensuring 99.9% data reliability.
  • Building Visual Multi-Modal Embedding pipelines that unify image and text representations into a shared vector space for real-time spatial computing and retrieval, boosting content discoverability by 12%.
  • Fine-tuning Qwen-3B-VL using PEFT techniques, tracked with Weights & Biases, reducing compute consumption by 20% while preserving task performance.
  • Designing LLM-as-a-Judge evaluation frameworks to score model outputs for quality, relevance, and hallucination risk across content pipelines.
  • Working on SEO-driven content optimization using LLMs to improve metadata relevance and discoverability signals across Snap surfaces.
  • Contributing to geo-aware recommendation systems that adapt content ranking and surfacing based on location signals and regional engagement patterns.
BigQueryPythonPEFTW&BVectorDBLLM-as-a-JudgeSEOGeo
Software Engineer II @ Akamai Technologies
MAY 2024 to FEB 2026 · San Jose, CA
  • Designed and deployed an Elasticsearch MCP server integrating LLMs for agentic operational workflows, hitting 92% interaction accuracy in automated triage and incident response.
  • Implemented GitOps-driven CI/CD using Jenkins and Kubernetes, standardizing canary releases to enable 30% traffic scalability.
  • Integrated real-time monitoring (Kibana, Grafana), improving incident response by 50% and reducing vulnerabilities by 45%.
  • Built LangChain agentic workflows and developed evaluation pipelines with RAGAS/LangSmith to prevent hallucinations.
LangChainElasticsearchKubernetesGrafanaGitOpsRAGAS
Software Engineer @ Nokia
JAN 2023 to SEP 2023 · Sunnyvale, CA
  • Engineered a scalable Python-based CI/CD framework across SR-OS and SR-Linux platforms, reducing manual deployment intervention by 30%.
  • Deployed and configured 15+ networking protocols (OSPF, BGP) using Docker, cutting setup time by 20%.
  • Optimized JavaScript frontend and PHP backend, yielding a 25% efficiency increase in UI responsiveness and data visualization.
PythonDockerCI/CDBGP/OSPFJavaScript
Software Engineer (OCI) II @ Oracle
JUN 2020 to JAN 2022 · Hyderabad, India
  • Led development of a Java/Spring Boot + Oracle JET application to optimize CPQ pricing workflows, improving processing times by 20%.
  • Integrated ML models into automated monitoring to predict pricing workflow bottlenecks before production impact.
  • Validated 50+ deployment scripts for Oracle CPQ with 90% success rate using Docker + Kubernetes.
  • Orchestrated legacy-to-OCI migration with Terraform (IaC), reducing compute time by 45%.
JavaSpring BootTerraformOCIKubernetes
Software Developer Intern @ GE Appliances
JAN 2020 to JUN 2020 · Hyderabad, India
  • Built an internal application and REST APIs in Python and Django for hardware inventory tracking with search by dimensions and manufacturing date, improving system reliability by 30% and contributing to operations supporting $1 billion in profits.
  • Led a data migration project that eliminated dependency on expensive third-party storage, a cost-saving initiative that was applauded across the org and recognized at the leadership level.
  • Drove end-to-end design, architecture, development, testing, and documentation of an Oracle JET application that saves the company billions in operational costs — recognized by the COO and awarded Best Intern.
  • Enforced software health by maintaining 80%+ unit and integration test coverage using JUnit and Mockito, directly supporting the test engineering lifecycle for critical production releases.
PythonDjangoOracle JETJUnitMockitoData MigrationREST APIs

Things I've Built

Selected work across ML, systems, and full-stack

$run triage_agent.py
Loading LangChain + ES MCP...
Accuracy: 92%
Incidents triaged: 1,204
Hallucinations blocked: RAGAS ✓
$deploy --env prod ✓
🤖
Agentic Incident Response
LLM-powered workflow using LangChain + Elasticsearch MCP server for automated triage and incident response, with 92% accuracy at Akamai scale.
LangChainElasticsearchPythonLLMs
$embed_pipeline --model Qwen-3B-VL
Dataset: 50M+ rows (BigQuery)
Method: PEFT fine-tuning
Discoverability: +12%
Compute saved: -20%
$push to vector store ✓
Visual Embedding Pipeline
Snap Inc: Visual Multi-Modal Embeddings in a unified vector space for real-time spatial computing. 12% discoverability boost over 50M+ BigQuery rows.
BigQueryPEFTW&BVectorDB
$optimize --algo dijkstra
Nodes: 2,400 stops
Traffic feed: Google Maps API
Route efficiency: ↑ 32%
Fuel cost: ↓ 18%
$deploy Spring Boot ✓
🗺
Last-Mile Delivery Optimizer
Route optimization using graph algorithms and real-time traffic data to minimize delivery times and operational cost across logistics networks.
JavaSpring BootGoogle Maps API
$start node --port 8080
Protocol: TCP Sockets
Replication: 3-way consensus
Failover: auto ✓
Ext. frameworks: none
$cluster healthy ✓
Distributed Key-Value Store
Socket-level distributed KV store with consensus and fault-tolerant data replication across nodes, built from scratch with no external frameworks.
GoSocketsDistributed Systems
$npm start
Stack: React + Node.js
Storage: AWS S3
DB: MongoDB
Auth: JWT ✓
$infinite scroll live ✓
🎨
Pinterest-Style Web App
Full-stack image-sharing platform with infinite scroll, authentication, saved posts, and cloud media storage.
ReactNode.jsMongoDBAWS S3
$./gradlew build
Lang: Kotlin · MVVM
Backend: Firebase
Video: ExoPlayer
Architecture: Clean ✓
$BUILD SUCCESSFUL ✓
📱
E-Learning Android App
Native Android e-learning app with course management, video streaming, and progress tracking using Kotlin and MVVM architecture.
KotlinAndroidFirebaseMVVM

Research & Credentials

Published work and certifications

Question Duplication using Deep Learning
Co-authored research proposing an LSTM-based neural network for semantic matching of duplicate question pairs on platforms like Quora, enabling efficient knowledge retrieval and community moderation at scale.
Applications and Challenges of Generative AI in Modern Healthcare Systems
Co-authored analysis of technical, ethical, and regulatory challenges of GenAI in healthcare, proposing bias mitigation frameworks and privacy-preserving architectures including Federated Learning.
☁️
AWS Certified Developer
Associate
Amazon Web Services
OCI Certified Developer
Professional
Oracle Cloud Infrastructure
🧠
ML & Deep Learning
Certified
IBM

What's Next?

Get In Touch

I'm currently open to new opportunities. Whether you have a role, a question, or just want to talk AI systems and distributed infrastructure. My inbox is always open.

hephzibahsaidu7@gmail.com