Skip to main content

CI/CD Troubleshooting Guide

Resolving common issues, optimizing performance, and following best practices

This guide helps you diagnose and resolve common CI/CD pipeline issues for both NX-based and standalone architectures.

Common Issues & Solutions

Feature Branch CI Issues

Problem: Feature branch builds failing

# Check workflow status
gh workflow list
gh run list --branch feature/my-feature

# View detailed logs
gh run view --log

Solution: Ensure feature branch is up-to-date with main:

git checkout feature/my-feature
git rebase main
git push --force-with-lease

Tag Cutting Issues

Problem: Manual tag action not appearing

  • Check Permissions: Ensure workflow_dispatch permissions
  • Branch Protection: Verify main branch allows manual workflows
  • Action Visibility: Confirm workflow file is in main branch

Problem: Image not found during tag cutting

# Verify image exists in registry
docker pull registry.company.com/service:commit-sha

# Check registry permissions
docker login registry.company.com

Pipeline Performance Issues

NX Build Optimization

# Clear NX cache
npx nx reset

# Analyze build performance
npx nx dep-graph
npx nx affected:dep-graph

Standalone Build Optimization

# Optimize Docker build cache
docker system prune
docker builder prune

# Use BuildKit for faster builds
export DOCKER_BUILDKIT=1

Performance Optimization

Build Speed Improvements

For NX-Based Systems

  1. Cache Optimization: Configure proper NX caching
  2. Dependency Graph: Optimize build order
  3. Parallel Execution: Enable parallel builds where possible
  4. Selective Testing: Run only affected tests

For Standalone Systems

  1. Docker Layer Caching: Optimize Dockerfile layer order
  2. Dependency Caching: Cache node_modules, pip packages, etc.
  3. Build Context: Minimize Docker build context size
  4. Multi-stage Builds: Use multi-stage for smaller final images

GitHub Actions Optimization

# Example optimization techniques
jobs:
build:
runs-on: ubuntu-latest
steps:
# Use action caching
- uses: actions/cache@v3
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}

# Parallel matrix builds
strategy:
matrix:
node-version: [16, 18, 20]

# Conditional steps
- name: Run tests
if: contains(github.event.head_commit.message, '[test]')

Debugging Workflows

GitHub Actions Debugging

Enable Debug Logging

# Set repository secrets
ACTIONS_RUNNER_DEBUG: true
ACTIONS_STEP_DEBUG: true

Common Debug Commands

# Check runner environment
echo "Runner OS: ${{ runner.os }}"
echo "GitHub workspace: ${{ github.workspace }}"
echo "GitHub event: ${{ github.event_name }}"

# Debug file permissions
ls -la
pwd
whoami

Container Debugging

Local Docker Testing

# Build and run locally
docker build -t test-image .
docker run -it test-image /bin/bash

# Check container layers
docker history test-image

# Inspect image
docker inspect test-image

Registry Issues

# Test registry connectivity
docker login registry.company.com
docker pull hello-world
docker tag hello-world registry.company.com/test:latest
docker push registry.company.com/test:latest

Performance Monitoring

Key Metrics to Track

MetricTargetAction if Exceeded
Build Duration< 10 minutesOptimize dependencies, caching
Test Execution< 5 minutesParallelize, selective testing
Image Size< 500MBMulti-stage builds, base image optimization
Success Rate> 95%Investigate frequent failures

Monitoring Tools

GitHub Actions Insights

  • Workflow run history: Identify patterns in failures
  • Job duration trends: Track performance over time
  • Resource usage: Monitor runner utilization

Custom Monitoring

# Add timing to workflows
- name: Build with timing
run: |
start_time=$(date +%s)
npm run build
end_time=$(date +%s)
echo "Build took $((end_time - start_time)) seconds"

Security Troubleshooting

Secret Management Issues

Problem: Secrets not available in workflow

Solutions:

  1. Check secret scope (repository vs. organization)
  2. Verify workflow permissions
  3. Ensure secret names match exactly (case-sensitive)

Problem: Token permissions insufficient

# Add proper permissions to workflow
permissions:
contents: read
packages: write
id-token: write

Container Security

Vulnerability Scanning

# Local security scanning
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image your-image:tag

# Check for outdated dependencies
npm audit
pip check

Best Practices

Feature Branch Strategy

  1. Keep Branches Small: Easier to validate and merge
  2. Regular Rebasing: Stay current with main branch
  3. Descriptive Names: Use prefixes like feature/, bugfix/, hotfix/
  4. Clean History: Squash commits before merge

Tag Cutting Guidelines

  1. Semantic Versioning: Follow semver (major.minor.patch)
  2. Release Notes: Document changes in each release
  3. Environment Testing: Validate in staging before production tag
  4. Rollback Plan: Ensure previous versions remain available

Pipeline Maintenance

  1. Regular Updates: Keep actions and dependencies current
  2. Monitoring: Set up alerts for pipeline failures
  3. Documentation: Keep runbooks updated
  4. Testing: Validate pipeline changes in development environments

🆘 Emergency Procedures

Pipeline Outage Response

Immediate Actions

  1. Assess Impact: Determine affected services and environments
  2. Communicate: Notify stakeholders via Slack/incident channels
  3. Investigate: Check GitHub Actions status, runner availability
  4. Workaround: Consider manual deployment if critical

Escalation Path

  1. Platform Team: First line of support for CI/CD issues
  2. DevOps Lead: For architectural decisions
  3. Engineering Manager: For business impact decisions

Rollback Procedures

Failed Deployment Rollback

# For NX-based systems
git revert <commit-hash>
git push origin main

# For standalone systems
# Use previous tag
kubectl set image deployment/app app=registry.com/app:v1.2.2

Database Migration Rollback

# Always test rollback scripts
npm run migrate:down
# or
python manage.py migrate app_name 0001 --fake

This troubleshooting guide is maintained by the DevOps and Platform Engineering teams.