Anatomy of a Dockerfile
A Dockerfile is a text file containing instructions to build a Docker image. Each instruction creates a new layer in the image.
A layer in Docker is essentially a set of filesystem changes that represent a specific instruction (like FROM, RUN, COPY, ADD(which I’ve never had to use) ) in a Dockerfile. Layers are cacheable units, and can be shared across multiple containers
Essential Dockerfile Instructions
FROM - The Foundation
# Always start with a base image
FROM node:18-alpine
# Use specific tags, avoid 'latest' in production
FROM ubuntu:22.04
# Multi-stage builds start here
FROM golang:1.21 AS builder
WORKDIR - Set Working Directory
# Sets the working directory for subsequent instructions
WORKDIR /app
# Creates the directory if it doesn't exist
# Much better than using RUN cd /app, allegedly
COPY vs ADD
# COPY - preferred for simple file copying
COPY package*.json ./
COPY src/ ./src/
# ADD - has extra features (auto-extraction, URLs)
ADD https://example.com/file.tar.gz /tmp/
ADD archive.tar.gz /extracted/ # Auto-extracts
RUN - Execute Commands
# Each RUN creates a new layer - combine commands to reduce layers
RUN apt-get update && apt-get install -y \
curl \
vim \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN npm ci --only=production
EXPOSE - Document Ports
# EXPOSE is purely documentation - it does NOT publish ports!
EXPOSE 3000
EXPOSE 80 443
# Think of it as a hint to other developers about what ports your app uses
# To actually publish ports, you need:
# docker run -p 3000:80 myapp (maps host:container)
# or in docker-compose:
# ports:
# - "3000:3000"
# EXPOSE helps with:
# 1. Documentation - shows intended ports
# 2. Container linking (legacy Docker feature)
# 3. Random port assignment: docker run -P myapp
# (publishes all EXPOSEd ports to random host ports)
ENV - Environment Variables
ENV NODE_ENV=production
ENV PORT=3000
ENV DATABASE_URL=postgresql://localhost:5432/myapp
USER - Security Best Practice
# Don't run as root!
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001
USER nextjs
CMD vs ENTRYPOINT - The Execution Difference
CMD - Provides defaults that can be overridden:
CMD ["npm", "start"]
# Running the container:
docker run myapp # Runs: npm start
docker run myapp npm run dev # Runs: npm run dev (CMD is ignored!)
docker run myapp /bin/bash # Runs: /bin/bash (CMD is ignored!)
ENTRYPOINT - Always executes, can’t be easily overridden:
ENTRYPOINT ["npm"]
# Running the container:
docker run myapp # Runs: npm (probably errors)
docker run myapp start # Runs: npm start
docker run myapp run dev # Runs: npm run dev
docker run myapp /bin/bash # Runs: npm /bin/bash (probably errors!)
Combined Pattern - Best of both worlds:
ENTRYPOINT ["npm"]
CMD ["start"]
# Running the container:
docker run myapp # Runs: npm start
docker run myapp run dev # Runs: npm run dev
docker run myapp test # Runs: npm test
# The ENTRYPOINT is always "npm", CMD provides the default argument
# You can override the argument, but npm will always be the command
Shell vs Exec Form: CMD runs in shell, while
# Shell form - runs in shell, can use shell features, but doesn't handle signals well
CMD npm start
ENTRYPOINT npm start
What actually happens:
- Runs as: /bin/sh -c “npm start”
- Creates a shell process that then runs your command
# Exec form - runs directly, better for containers, handles signals properly
CMD ["npm", "start"]
ENTRYPOINT ["npm", "start"]
# PID 1 problem: Shell form creates a shell process as PID 1,
# which doesn't properly forward signals to your app.
# Exec form makes your app PID 1, so it gets signals directly.
Complete Example - Node.js App (Multi-stage Explained)
# DEPS stage - Production dependencies only
FROM node:18-alpine AS deps
WORKDIR /app
COPY package*.json ./
# Only install production deps and clean cache to minimize layer size
RUN npm ci --only=production && npm cache clean --force
# BUILDER stage - Build the application
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
# Install ALL dependencies (including devDependencies for building)
RUN npm ci
COPY . .
# Build the application (needs devDependencies like TypeScript, bundlers, etc.)
RUN npm run build
# RUNNER stage - Final production image
FROM node:18-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001
# Copy built application from builder stage
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
# Copy ONLY production dependencies from deps stage
COPY --from=deps --chown=nextjs:nodejs /app/node_modules ./node_modules
COPY --chown=nextjs:nodejs package.json ./
USER nextjs
EXPOSE 3000
CMD ["npm", "start"]
Why separate deps and builder stages?
The key insight is that we need different dependencies at different stages:
-
DEPS stage: Only production dependencies (
--only=production)- These are needed to RUN the app in production
- Examples: express, react, lodash
-
BUILDER stage: ALL dependencies (production + dev)
- DevDependencies are needed to BUILD the app
- Examples: typescript, webpack, @types packages, testing libraries
- We don’t want these in our final image (bloat + security)
Alternative (less optimal) approach:
# This works but creates a larger final image
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci # Installs ALL deps
COPY . .
RUN npm run build
RUN npm prune --production # Remove devDeps after build
CMD ["npm", "start"]
# Problems:
# 1. Final image contains build artifacts and temp files
# 2. npm prune doesn't always clean everything
# 3. Larger attack surface
# 4. Less cacheable layers