Summary
- Built with OpenAI’s Realtime API using WebSockets for low-latency, bi-directional speech.
- Seamlessly connects voice input to a vector database for semantic searching of clinical guidelines.
- Features server-side Voice Activity Detection (VAD) enabling users to have a fluid and natrual dialogue.
- Remembers returning users via Caller ID for personalized greetings, emails and preffered speaking speed.
- Including a tool to send RAG search summaries via email.
Starting from the standalone Clinical Audit Platform Phase 1 project, I built a voice route which leverages the OpenAI Realtime API to provide a seamless, conversational interface for querying complex medical documents directly through the browser.

Upon entering a caller ID (will be a real user’s phone number in production), users establish a low-latency WebSocket connection that enables natural, bi-directional communication. The system employs advanced Voice Activity Detection (VAD) to allow users to interrupt the AI naturally at any moment, mimicking real human interaction. When a query is posed, the assistant utilizes a Retrieval-Augmented Generation (RAG) pipeline to semantically search through indexed clinical guidelines—specifically for diabetes and dermatitis—and synthesizes the findings into spoken responses with accurate citations.

The architecture prioritizes audio fidelity and user experience. It includes a custom audio processing pipeline that optimizes PCM16 to Float32 conversion and applies dynamic gain control to prevent distortion.
The system delivers a personalized experience by recognizing returning users with their name, email and preffered speaking speed (user will be able to provide these information during the call). The AI also have the ability to generate and email detailed RAG search summary which have key insights and guideline references.



