🧪 Hands-on lab · 60 min

Ansible — Best Practices & Production

1. Idempotence
2. Check mode + diff
3. Linting
4. Vault for secrets

Idempotence

The cornerstone of usable Ansible: a playbook is correct only if re-running it on an already-converged system reports changed=0.

The trap is command: and shell: — they have no built-in change detection, so every run reports changed=1 regardless of whether anything actually moved.

Compare these two playbooks. Write bad.yml:

---
- name: Naive — never idempotent
  hosts: linux
  gather_facts: false
  tasks:
    - name: Make sure /etc/cac-flag exists (wrong way)
      command: touch /etc/cac-flag
...

And good.yml:

---
- name: Idempotent
  hosts: linux
  gather_facts: false
  tasks:
    - name: Make sure /etc/cac-flag exists (right way)
      file:
        path: /etc/cac-flag
        state: touch
        mode: '0644'
      register: r
      changed_when: r.diff.before.state == "absent"
...

Run each twice:

ansible-playbook bad.yml
ansible-playbook bad.yml       # still says changed=6 — wrong
ansible-playbook good.yml
ansible-playbook good.yml      # changed=0 the second time — right

Rules of thumb:

Prefer modules that compute their own diff (copy, template, file, lineinfile, package, service) over command/shell.
When you must shell out, pair it with creates: (skip if path exists), removes: (skip if path missing), or changed_when: + failed_when:.

Click Verify step once both playbooks have been applied.

Hint

Run the same playbook twice — every task should report `changed=0` the second time.

Check mode + diff

--check runs a no-op pass: every task says what it would do but doesn't apply anything. --diff adds before/after preview where modules support it.

Together they're the closest Ansible gets to "terraform plan".

Write diff.yml:

---
- name: Preview changes safely
  hosts: linux
  gather_facts: false
  tasks:
    - name: Drop a config file
      copy:
        dest: /etc/cac-app.conf
        content: |
          listen=8080
          debug=true
        mode: '0644'
...

Apply it once for real so the file exists:

ansible-playbook diff.yml

Now edit diff.yml to change one of the values, e.g. debug=true → debug=false. Then preview without applying:

ansible-playbook diff.yml --check --diff

You'll see the unified diff (-debug=true / +debug=false) and a changed=6 line — but the actual files on disk are untouched.

Confirm:

ssh centos1 cat /etc/cac-app.conf
# still shows debug=true

Apply for real to converge:

ansible-playbook diff.yml

Click Verify step once /etc/cac-app.conf contains debug=false on every host.

Hint

`ansible-playbook --check --diff` previews changes without applying them.

Linting

ansible-lint enforces the canonical style + catches anti-patterns before they reach a code review. It ships as a separate package:

pip install --quiet ansible-lint
ansible-lint --version

Write a deliberately-bad flawed.yml to see it work:

---
- hosts: linux
  tasks:
    - name: bad — uses shell with no idempotence guard
      shell: echo "hello" > /etc/cac-flawed
    - name: bad — uses sudo: true (deprecated; use become)
      command: id
      sudo: true
    - name: bad — task name in lowercase, no full stop, and shell again
      shell: date >> /tmp/log
...

Now lint it:

ansible-lint flawed.yml

You'll see warnings about:

command-instead-of-shell (use command: when no shell features needed)
risky-shell-pipe / no-changed-when (shell without guards)
deprecated-module (sudo) — or whatever lint version flags
name[casing]

Fix them up — fixed.yml. The biggest change is FQCN (fully-qualified collection names): ansible.builtin.copy instead of bare copy. The production profile of ansible-lint enforces it.

---
- name: Lint-clean version
  hosts: linux
  gather_facts: false
  become: false
  tasks:
    - name: Make sure /etc/cac-flawed exists
      ansible.builtin.copy:
        dest: /etc/cac-flawed
        content: "hello\n"
        mode: '0644'

    - name: Print id and stash it
      ansible.builtin.command: id
      register: id_out
      changed_when: false
...

ansible-lint fixed.yml
echo "exit: $?"

Exit code 0 means lint passes. Click Verify step.

Hint

`pip install ansible-lint`; run `ansible-lint <playbook>` to surface anti-patterns.

Vault for secrets

You don't check passwords / API tokens into git in plaintext. ansible-vault encrypts individual values or whole files with a shared password and stores the ciphertext in your repo.

Pick a vault password:

echo 'cac-lab-password' > /root/ansible/.vault-pw
chmod 600 /root/ansible/.vault-pw

Encrypt a single value for inline use:

ansible-vault encrypt_string \
    --vault-password-file /root/ansible/.vault-pw \
    --name 'api_token' \
    'sk-abc123-supersecret'

The command prints a YAML block like:

api_token: !vault |
  $ANSIBLE_VAULT;1.1;AES256
  6266...

Paste it into vars/secrets.yml:

---
api_token: !vault |
  $ANSIBLE_VAULT;1.1;AES256
  <copy the encrypted block here>
...

Then secret.yml:

---
- name: Use a vaulted secret
  hosts: linux
  gather_facts: false
  vars_files:
    - vars/secrets.yml
  tasks:
    - name: Render it into a file (only on this run — never log it)
      copy:
        dest: /etc/cac-token.env
        content: "API_TOKEN={{ api_token }}\n"
        mode: '0600'
      no_log: true                # hide the value from stdout + logs
...

Apply with the password file:

ansible-playbook --vault-password-file /root/ansible/.vault-pw secret.yml

Decrypt on a host to prove it landed:

ssh centos1 cat /etc/cac-token.env

no_log: true is critical — without it, the secret would appear in the task output and any callback plugin would ship it to Datadog/etc.

Click Verify step when /etc/cac-token.env contains the decrypted token on every host.

Hint

`ansible-vault encrypt_string '<value>' --name '<varname>'` for inline use.