This is a complete, guide for setting up Gemini 2.5 Flash Live Preview (real-time audio + optional camera/screen frames) on a Mac with Miniconda.
What you’ll build
A Python script (live_preview_stream.py
) that:
- streams your mic audio to Gemini and plays back the model’s voice in real time,
- optionally streams webcam frames (
--mode camera
) or screen captures (--mode screen
), - lets you also send typed text in the terminal.
Prerequisites
- macOS on Apple Silicon (M1/M2/M3/M4).
- Miniconda (or Anaconda).
- (Optional) Homebrew — only needed if you prefer to install PortAudio via brew.
If you don’t have Miniconda: https://zahiralam.com/blog/conda-installation-on-apple-silicon-mac-simplified-step-by-step-instructions/
1) Create a clean conda environment
We’ll use Python 3.11 because the script uses asyncio.TaskGroup
and ExceptionGroup
(which are 3.11 features).
conda create -n gemini-live python=3.11 -y
conda activate gemini-live
2) Install PortAudio and PyAudio (audio I/O)
Option A (keeps everything inside the conda env) ✅
# install PortAudio into the env
conda install -c conda-forge portaudio -y
# tell the compiler to look inside this env for headers/libs
export CFLAGS="-I$CONDA_PREFIX/include"
export LDFLAGS="-L$CONDA_PREFIX/lib"
export PKG_CONFIG_PATH="$CONDA_PREFIX/lib/pkgconfig"
# build PyAudio against that PortAudio
python -m pip install --no-binary :all: pyaudio
Verify audio devices
python - << 'PY'
import pyaudio
p = pyaudio.PyAudio()
print("OK. Devices:")
for i in range(p.get_device_count()):
d = p.get_device_info_by_index(i)
print(i, d["name"], int(d["maxInputChannels"]), int(d["maxOutputChannels"]))
p.terminate()
PY
You should see your iPhone Microphone, MacBook Pro Microphone, Speakers, etc.
If you see mic errors: On macOS, go to System Settings → Privacy & Security → Microphone and allow your Terminal/iTerm/VS Code. For camera mode, also allow Camera.
3) Install the SDK and vision/screen dependencies
python -m pip install -U google-genai opencv-python pillow mss
google-genai
— the new official SDK (from google import genai
)opencv-python
(cv2
) — webcam capture + color conversionspillow
— image re-encoding (JPEG)mss
— screen capture
4) Provide your API key
You have three safe choices; pick one.
Option 1 — Environment variable (recommended for servers/CI)
export GOOGLE_API_KEY="YOUR_REAL_KEY"
Option 2 — .env
file (nice for local dev)
Create .env
in your project:
GOOGLE_API_KEY=YOUR_REAL_KEY
Load it early in your script:
from dotenv import load_dotenv
load_dotenv() # pip install python-dotenv
Option 3 — Inline (quick demo only)
client = genai.Client(api_key="YOUR_REAL_KEY")
5) The working script (key parts)
Your live_preview_stream.py
already has all the right plumbing. Keep your logic and queues; ensure the following details match:
- Audio settings (good already):
FORMAT = pyaudio.paInt16 CHANNELS = 1 SEND_SAMPLE_RATE = 16000 # mic → model RECEIVE_SAMPLE_RATE = 24000 # model → speakers CHUNK_SIZE = 1024
- Live connect (already good):
async with client.aio.live.connect(model=MODEL, config=CONFIG) as session: self.session = session ...
- Camera frames (you already fixed BGR→RGB):
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) img = PIL.Image.fromarray(frame_rgb) # re-encode to JPEG, base64, send with mime "image/jpeg"
- Screen frames (via
mss
) — you re-encode to JPEG before sending. - Messages to model:
Your code usesawait self.session.send(...)
. That’s fine today. The SDK prints a DeprecationWarning(future change, not urgent). You can ignore it, silence it, or later migrate:- Text / images / screen →
send_client_content(...)
- Realtime audio chunks →
send_realtime_input(...)
- Text / images / screen →
To silence the warning for now:
import warnings
warnings.filterwarnings(
"ignore",
message="The `session.send` method is deprecated",
category=DeprecationWarning,
)
6) Run it
From your project folder:
conda activate gemini-live
# voice only
python live_preview_stream.py --mode none
# webcam + mic
python live_preview_stream.py --mode camera
# screen + mic
python live_preview_stream.py --mode screen
When it starts you’ll see message >
. Type to send text; speak to send audio. Press q
+ Enter to exit.
Quality checks & tips
- List audio devices (we did this) to ensure mic/speaker are visible to PyAudio.
- macOS permissions: the Terminal/iTerm/VS Code app must be allowed to use Microphone (and Camera for camera mode).
- Frame rate: you already throttle to ~1 FPS with
await asyncio.sleep(1.0)
which is perfect for “preview” without blowing the context. - .env for local dev: add
python-dotenv
to avoid exporting every time:python -m pip install python-dotenv
and put this near the top of your script:from dotenv import load_dotenv load_dotenv()
Copy-paste install block (Apple Silicon)
# 1) env
conda create -n gemini-live python=3.11 -y
conda activate gemini-live
# 2) audio stack
conda install -c conda-forge portaudio -y
export CFLAGS="-I$CONDA_PREFIX/include"
export LDFLAGS="-L$CONDA_PREFIX/lib"
export PKG_CONFIG_PATH="$CONDA_PREFIX/lib/pkgconfig"
python -m pip install --no-binary :all: pyaudio
# 3) sdk + vision/screen
python -m pip install -U google-genai opencv-python pillow mss
# 4) api key (pick one method)
export GOOGLE_API_KEY="YOUR_REAL_KEY"
# 5) run
python live_preview_stream.py --mode none
# or
python live_preview_stream.py --mode camera
# or
python live_preview_stream.py --mode screen