Early this year I got an Apple M4 Pro (64 GB) as my home lab to tinker with AI. It’s not the most powerful machine for sure, but it has enough to try out some reasonably large LLMs locally, easier to share configs with my other Apple laptops (work + personal), and small enough that I can just put it at a corner of my working desk and forget about it.
This document is going to be a “live journal” on my setup, mainly as a note to the future me, and also hopefully provide some inspirations to others.
Basic Sysadmin Stuff
Enable SSH
sudo systemsetup -setremotelogin onInstall Tailscale
Install Tailscale on Mac Mini and also other devices you want to connect to it. Then on your client devices:
ssh-copy-id user@my-mac-mini # so that you don't have to try typing password
ssh user@my-mac-miniLLM
Cloud-based LLM CLIs
To use cloud-based LLM CLIs, you first need to set up your Node.js environment. I recommend using nvm (Node Version Manager) to manage Node.js versions.
- Install nvm: Follow the instructions on the nvm GitHub page.
- Configure nvm for automatic version switching: To automatically switch to the correct Node.js version for a project, you can create a
.nvmrcfile in your project’s root directory. You can find more details in the nvm documentation. - Install LLM CLIs: Once
nvmis set up, you can install your favorite LLM CLIs. For example, to use the LTS version of Node.js and install some popular CLIs, you can do the following:
# Create a .nvmrc file to use the LTS version
echo "lts/*" > .nvmrc
# Install the CLIs globally
npm install -g @openai/codex @google/gemini-cli @anthropic-ai/claude-codeSimon Willison’s LLM CLI
See Setup - LLM
# Install llm via UV
uv tool install llm
# set API key for uvx?
uv tool run llm keys set openai # ~/Library/Application\ Support/io.datasette.llm/keys.json
# Example with hackner news plugin
llm install llm-hacker-news
uv tool run llm -f hn:43984860 'summary with illustrative direct quotes'Local LLM
For guidance on which models fit this hardware (M4 Pro, 273 GB/s bandwidth, 64GB), see How to estimate local model performance. For community benchmarks and model recommendations, see Running Local LLM.
Ollama
ollama pull deepseek-r1:32b
uv tool install open-webui
uv tool run open-webui serveQwen3 using MLX
Running Qwen3 on your macbook, using MLX, to vibe code for free
Besides setup process for Localforge, it’s essentially serving the model via mlx-lm:
mlx_lm.server --model mlx-community/Qwen3-30B-A3B-8bit --trust-remote-code --port 8082We can combine the tip from the simonw’s HN comment and a fix to force Python 3.12:
uv run --isolated -p 3.12 --with mlx-lm mlx_lm.server --model mlx-community/Qwen3-30B-A3B-8bit --trust-remote-code --port 8082Speech to Text
From parakeet-mlx
uvx parakeet-mlx default_tc.mp3Running Whisper.cpp Locally
ffmpeg -i <input_file> -ar 16000 -ac 1 -c:a pcm_s16le <output_filename>.wav
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
./models/download-ggml-model.sh large-v3-turbo
cmake -B build
cmake --build build --config Release
./build/bin/whisper-cli -m models/ggml-large-v3-turbo.bin -f <input_file> -l zh -otxt <output_filename>Edited by Claude (claude-sonnet-4-5-20250929)