Gemini 2.5 Flash Live Preview on macOS (Apple Silicon) : End-to-End Tutorial

This is a complete, guide for setting up Gemini 2.5 Flash Live Preview (real-time audio + optional camera/screen frames) on a Mac with Miniconda.


What you’ll build

A Python script (live_preview_stream.py) that:

  • streams your mic audio to Gemini and plays back the model’s voice in real time,
  • optionally streams webcam frames (--mode camera) or screen captures (--mode screen),
  • lets you also send typed text in the terminal.

Prerequisites

  • macOS on Apple Silicon (M1/M2/M3/M4).
  • Miniconda (or Anaconda).
  • (Optional) Homebrew — only needed if you prefer to install PortAudio via brew.

If you don’t have Miniconda: https://zahiralam.com/blog/conda-installation-on-apple-silicon-mac-simplified-step-by-step-instructions/


1) Create a clean conda environment

We’ll use Python 3.11 because the script uses asyncio.TaskGroup and ExceptionGroup (which are 3.11 features).

conda create -n gemini-live python=3.11 -y
conda activate gemini-live

2) Install PortAudio and PyAudio (audio I/O)

Option A (keeps everything inside the conda env) ✅

# install PortAudio into the env
conda install -c conda-forge portaudio -y

# tell the compiler to look inside this env for headers/libs
export CFLAGS="-I$CONDA_PREFIX/include"
export LDFLAGS="-L$CONDA_PREFIX/lib"
export PKG_CONFIG_PATH="$CONDA_PREFIX/lib/pkgconfig"

# build PyAudio against that PortAudio
python -m pip install --no-binary :all: pyaudio

Verify audio devices

python - << 'PY'
import pyaudio
p = pyaudio.PyAudio()
print("OK. Devices:")
for i in range(p.get_device_count()):
    d = p.get_device_info_by_index(i)
    print(i, d["name"], int(d["maxInputChannels"]), int(d["maxOutputChannels"]))
p.terminate()
PY

You should see your iPhone MicrophoneMacBook Pro MicrophoneSpeakers, etc.

If you see mic errors: On macOS, go to System Settings → Privacy & Security → Microphone and allow your Terminal/iTerm/VS Code. For camera mode, also allow Camera.


3) Install the SDK and vision/screen dependencies

python -m pip install -U google-genai opencv-python pillow mss
  • google-genai — the new official SDK (from google import genai)
  • opencv-python (cv2) — webcam capture + color conversions
  • pillow — image re-encoding (JPEG)
  • mss — screen capture

4) Provide your API key

You have three safe choices; pick one.

Option 1 — Environment variable (recommended for servers/CI)

export GOOGLE_API_KEY="YOUR_REAL_KEY"

Option 2 — .env file (nice for local dev)

Create .env in your project:

GOOGLE_API_KEY=YOUR_REAL_KEY

Load it early in your script:

from dotenv import load_dotenv
load_dotenv()  # pip install python-dotenv

Option 3 — Inline (quick demo only)

client = genai.Client(api_key="YOUR_REAL_KEY")

5) The working script (key parts)

Your live_preview_stream.py already has all the right plumbing. Keep your logic and queues; ensure the following details match:

  • Audio settings (good already):FORMAT = pyaudio.paInt16 CHANNELS = 1 SEND_SAMPLE_RATE = 16000 # mic → model RECEIVE_SAMPLE_RATE = 24000 # model → speakers CHUNK_SIZE = 1024
  • Live connect (already good):async with client.aio.live.connect(model=MODEL, config=CONFIG) as session: self.session = session ...
  • Camera frames (you already fixed BGR→RGB):frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) img = PIL.Image.fromarray(frame_rgb) # re-encode to JPEG, base64, send with mime "image/jpeg"
  • Screen frames (via mss) — you re-encode to JPEG before sending.
  • Messages to model:
    Your code uses await self.session.send(...). That’s fine today. The SDK prints a DeprecationWarning(future change, not urgent). You can ignore it, silence it, or later migrate:
    • Text / images / screen → send_client_content(...)
    • Realtime audio chunks → send_realtime_input(...)

To silence the warning for now:

import warnings
warnings.filterwarnings(
    "ignore",
    message="The `session.send` method is deprecated",
    category=DeprecationWarning,
)

6) Run it

From your project folder:

conda activate gemini-live

# voice only
python live_preview_stream.py --mode none

# webcam + mic
python live_preview_stream.py --mode camera

# screen + mic
python live_preview_stream.py --mode screen

When it starts you’ll see message >. Type to send text; speak to send audio. Press q + Enter to exit.


Quality checks & tips

  • List audio devices (we did this) to ensure mic/speaker are visible to PyAudio.
  • macOS permissions: the Terminal/iTerm/VS Code app must be allowed to use Microphone (and Camera for camera mode).
  • Frame rate: you already throttle to ~1 FPS with await asyncio.sleep(1.0) which is perfect for “preview” without blowing the context.
  • .env for local dev: add python-dotenv to avoid exporting every time:python -m pip install python-dotenv and put this near the top of your script:from dotenv import load_dotenv load_dotenv()

Copy-paste install block (Apple Silicon)

# 1) env
conda create -n gemini-live python=3.11 -y
conda activate gemini-live

# 2) audio stack
conda install -c conda-forge portaudio -y
export CFLAGS="-I$CONDA_PREFIX/include"
export LDFLAGS="-L$CONDA_PREFIX/lib"
export PKG_CONFIG_PATH="$CONDA_PREFIX/lib/pkgconfig"
python -m pip install --no-binary :all: pyaudio

# 3) sdk + vision/screen
python -m pip install -U google-genai opencv-python pillow mss

# 4) api key (pick one method)
export GOOGLE_API_KEY="YOUR_REAL_KEY"

# 5) run
python live_preview_stream.py --mode none
# or
python live_preview_stream.py --mode camera
# or
python live_preview_stream.py --mode screen

Leave a Reply

Your email address will not be published. Required fields are marked *