Categorygithub.com/Mliviu79/go-openai-realtime

# README

Go OpenAI Realtime API Client

Go Reference Go Report Card MIT License

A fully-featured Go client for the OpenAI Realtime API, supporting multi-modal conversations with text and audio. This project is a heavily refactored and restructured fork of WqyJh/go-openai-realtime.

Overview

This client allows you to integrate with OpenAI's Realtime API to build applications that can have natural, streaming conversations with OpenAI models like GPT-4o. The Realtime API enables:

  • Multi-modal conversations - Support for both text and audio modalities
  • Bidirectional streaming - Real-time streaming of both user inputs and model outputs
  • Voice interactions - Voice input/output with different voice options and formats
  • Function calling - Define and call tools/functions from the model
  • Turn detection - Voice Activity Detection (VAD) for natural conversation turns

This client library provides a complete implementation of all 28 incoming and 9 outgoing message types supported by the Realtime API.

Key Differences from Original

This fork has been extensively refactored to provide:

  • More modular package structure with clear separation of concerns
  • Comprehensive godoc documentation for every package and type
  • Type-safe API with well-defined interfaces
  • Improved error handling and logging
  • Expanded examples for various use cases

Supported Models

  • gpt-4o-realtime-preview
  • gpt-4o-realtime-preview-2024-10-01
  • gpt-4o-realtime-preview-2024-12-17
  • gpt-4o-mini-realtime-preview
  • gpt-4o-mini-realtime-preview-2024-12-17

Installation

go get github.com/Mliviu79/go-openai-realtime

This library requires Go version 1.23 or greater.

Quick Start

Here's a simple example to get started:

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"time"

	"github.com/Mliviu79/go-openai-realtime/messages/incoming"
	"github.com/Mliviu79/go-openai-realtime/messaging"
	"github.com/Mliviu79/go-openai-realtime/openaiClient"
	"github.com/Mliviu79/go-openai-realtime/session"
)

func main() {
	// Get API key from environment
	apiKey := os.Getenv("OPENAI_API_KEY")
	if apiKey == "" {
		log.Fatal("OPENAI_API_KEY environment variable is required")
	}

	// Create a context with timeout
	ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
	defer cancel()

	// Create a client
	client := openaiClient.NewClient(apiKey)

	// Connect to the API with the specified model
	conn, err := client.Connect(ctx, 
		openaiClient.WithModel(session.GPT4oRealtimePreview))
	if err != nil {
		log.Fatalf("Failed to connect: %v", err)
	}

	// Create messaging client to handle the protocol
	msgClient := messaging.NewClient(conn)

	// Send a text message
	err = msgClient.SendTextMessage(ctx, "Tell me about the OpenAI Realtime API", nil)
	if err != nil {
		log.Fatalf("Failed to send message: %v", err)
	}

	// Read and process messages
	fmt.Println("Response:")
	for {
		msg, err := msgClient.ReadMessage(ctx)
		if err != nil {
			log.Fatalf("Error reading message: %v", err)
		}

		// Handle different message types
		switch m := msg.(type) {
		case *incoming.ResponseTextDeltaMessage:
			// Print text deltas as they arrive
			fmt.Print(m.Delta.Text)
		case *incoming.ResponseDoneMessage:
			// End of the response
			fmt.Println("\nResponse complete")
			return
		}
	}
}

Session Management

You can create and manage sessions either through the REST API or WebSocket messages:

// Create a session via REST API
model := session.GPT4oRealtimePreview
modalities := []session.Modality{session.ModalityText, session.ModalityAudio}

createReq := &session.CreateRequest{
    SessionRequest: session.SessionRequest{
        Model:      &model,
        Modalities: &modalities,
    },
}

sessionResp, err := client.CreateSession(ctx, createReq)
if err != nil {
    log.Fatalf("Failed to create session: %v", err)
}

// Connect using the session ID
conn, err := client.Connect(ctx,
    openaiClient.WithModel(model),
    openaiClient.WithSessionID(sessionResp.ID))

Advanced Features

Audio Input/Output

The client supports sending and receiving audio in various formats:

// Configure audio formats in the session
inputFormat := session.AudioFormatPCM16
outputFormat := session.AudioFormatPCM16
voice := session.VoiceAlloy

createReq := &session.CreateRequest{
    SessionRequest: session.SessionRequest{
        Model:             &model,
        Modalities:        &[]session.Modality{session.ModalityText, session.ModalityAudio},
        InputAudioFormat:  &inputFormat,
        OutputAudioFormat: &outputFormat,
        Voice:             &voice,
    },
}

// Send audio data
audioData := []byte{...} // Your PCM16 audio data
err = msgClient.SendAudioMessage(ctx, audioData, nil)

Function Calling

You can define functions for the model to call:

// Define a tool/function
getWeatherParams := map[string]interface{}{
    "type": "object",
    "properties": map[string]interface{}{
        "location": map[string]interface{}{
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA",
        },
    },
    "required": []string{"location"},
}

tools := []session.Tool{
    {
        Type: "function",
        Function: &session.Function{
            Name:        "get_weather",
            Description: "Get the current weather in a given location",
            Parameters:  getWeatherParams,
        },
    },
}

// Add tools to the session
createReq := &session.CreateRequest{
    SessionRequest: session.SessionRequest{
        Model: &model,
        Tools: &tools,
    },
}

Turn Detection

Enable Voice Activity Detection (VAD) for natural conversation turns:

// Configure turn detection
turnDetectionType := "server_vad"
threshold := 0.5
prefixPaddingMs := 300
silenceDurationMs := 500
createResponse := true
interruptResponse := true

turnDetection := &session.TurnDetection{
    Type:              &turnDetectionType,
    Threshold:         &threshold,
    PrefixPaddingMs:   &prefixPaddingMs,
    SilenceDurationMs: &silenceDurationMs,
    CreateResponse:    &createResponse,
    InterruptResponse: &interruptResponse,
}

createReq := &session.CreateRequest{
    SessionRequest: session.SessionRequest{
        Model:         &model,
        TurnDetection: turnDetection,
    },
}

Comprehensive Example

For a complete demonstration of all message types and features, see the comprehensive example.

This example shows:

  • All 28 incoming message types
  • All 9 outgoing message types
  • Audio handling
  • Function calling
  • Turn detection
  • Error handling

Package Structure

Our library is organized into several packages with clear separation of concerns:

  • openaiClient - Main client package for API connection
  • session - Session management and configuration
  • messages - Message types and handling
    • incoming - Incoming message types
    • outgoing - Outgoing message types
    • types - Shared message type definitions
    • factory - Message factory functions
  • messaging - High-level messaging interface
  • ws - WebSocket connection management
  • httpClient - HTTP client for REST API endpoints
  • logger - Logging utilities
  • apierrs - Error handling

Documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.

# Packages

Package apierrs provides error types and handling for the OpenAI Realtime API.
Package main provides a comprehensive test for all OpenAI Realtime API message types.
Package httputil provides HTTP client utilities for making API requests.
Package logger provides logging functionality for the OpenAI Realtime API client.
Package messages provides types and functions for working with OpenAI Realtime API messages.
Package messaging provides high-level messaging functionality for the OpenAI Realtime API.
Package openaiClient provides a client for the OpenAI Realtime API.
Package session provides session management for the OpenAI Realtime API.
Package ws provides WebSocket functionality for the OpenAI Realtime API.