If you work with data pipelines, SQL, notebooks, or machine learning models, a Mac with Apple Silicon is genuinely one of the best machines you can have as a daily driver. The unified memory architecture means your CPU and GPU share the same memory pool, which matters a lot when you are running Docker containers, a local Postgres instance, Jupyter, and an AWS CLI session all at once without the machine breaking a sweat. This post covers which model to pick and how to get the whole stack running from scratch. Some of the product links below are affiliate links, meaning I may earn a small commission if you purchase through them, at no extra cost to you.
1. Picking the right model
The current MacBook line runs entirely on M5 chips. For most data engineers doing cloud-first work — Redshift, Glue, dbt, Airflow — the MacBook Air M5 with 16GB is more than enough. If you run heavier local workloads like training models, spinning up multiple containers, or working with large DataFrames in memory, the MacBook Pro 14" M5 Pro with 24GB is the right step up. The Pro has active cooling, which matters when you push it hard for extended periods. The 36GB M5 Max is only worth it if local deep learning training is a regular part of your day, otherwise you are paying for headroom you will rarely use.
2. First thing: Homebrew and Xcode tools
Everything else in this setup depends on Homebrew, the package manager for macOS. Before installing it, you need Apple's command line developer tools, which also gives you Git.
# Install Xcode command line tools xcode-select --install # Install Homebrew /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Add Homebrew to your PATH (Apple Silicon path) echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile eval "$(/opt/homebrew/bin/brew shellenv)"
3. Python environment management with uv
Forget installing Python directly or relying on conda for everything. The current best practice on Apple Silicon is uv, a fast Python package and project manager that handles Python versions and virtual environments without polluting your global install. It is significantly faster than pip and plays well with Jupyter.
# Install uv brew install uv # Create a new project environment mkdir my-de-project && cd my-de-project uv init # Add packages (same idea as pip install) uv add pandas sqlalchemy boto3 jupyterlab scikit-learn # Launch Jupyter inside the project environment uv run --with jupyter jupyter lab
4. Core tools via Homebrew
These cover the daily stack for data engineering: database clients, cloud CLIs, container runtime, and a better terminal.
# AWS CLI and tools brew install awscli # PostgreSQL client (psql without the full server) brew install libpq brew link --force libpq # Docker via OrbStack (lighter than Docker Desktop on Apple Silicon) brew install orbstack # Git, jq, and wget brew install git jq wget # VS Code brew install --cask visual-studio-code
5. PyTorch with Metal GPU acceleration
Apple Silicon GPUs run PyTorch through the Metal Performance Shaders (MPS) backend, which gives you real GPU acceleration for model training without needing CUDA. It works natively on macOS 12.3 or later and PyTorch picks it up automatically.
# Install PyTorch (Apple Silicon native build) uv add torch torchvision torchaudio # Verify MPS is available in Python import torch print(torch.backends.mps.is_available()) # should print True # Move a model to the MPS device device = "mps" if torch.backends.mps.is_available() else "cpu" model = model.to(device)
One thing worth knowing: MPS does not support every PyTorch operation yet. If you hit an unsupported op, set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 and PyTorch will silently fall back to CPU for that specific operation while keeping everything else on the GPU.
6. One limitation to consider
Macs do not support CUDA, and that is a real constraint if your production training runs on NVIDIA GPUs or if your team uses CUDA-specific libraries. The practical answer most engineers land on is using the Mac for development, prototyping, and running notebooks, then pushing actual training jobs to AWS SageMaker, Google Colab, or a cloud GPU instance. The unified memory architecture makes the Mac excellent for loading large quantized models locally — a 7B or 13B parameter model fits comfortably in 24GB unified memory — but for serious fine-tuning or multi-GPU training, cloud is still the right call.
A solid machine that gets out of your way
The real reason data engineers gravitate toward Macs is not any single spec — it is the combination of a Unix shell that works the way you expect, excellent battery life, and hardware that handles a full data stack locally without fan noise or thermal throttling on everyday tasks. Getting to a productive environment takes less than an hour with Homebrew, uv, and OrbStack in place. After that, you have PostgreSQL, Docker, AWS CLI, Jupyter, and PyTorch with GPU acceleration all running natively on Apple Silicon, which is a genuinely capable local setup for most data and AI workflows.
