Karan Sharma

Adding Prometheus configuration to your CI/CD workflow

4 minutes (1039 words)

Prometheus configurations can turn into a mess in no time if you have a lot of different jobs scraping different targets. Certainly you can use tools like jsonnet to keep your YAML files DRY but this post is not about that. I initially started off by writing one job to scrape a set of EC2 instances in a particular AWS VPC. Over the time, I had a requirement to do it over 3-4 different targets, each of them had their own rules and different type of exporters as well. The whole practice of SSHing into the server, opening the config file in vim and editing in the server didn’t just feel right. No surpises for guessing that I f*cked up a few times with config errors (YAML sigh) and finally decided that I need a better solution for this.

🔗CI/CD for config files

“Wait, so now you’re telling me that I need to setup a full fledged pipeline just for a few config files?”, must have been the thought that echo’ed in your head. Before you jump to the conclusion that it’s overdoing things, IMHO it’s not. And it’s not that difficult either to set it up, so why not? After integrating CI in your workflow, you can be confident that no bad syntax in your PromQL queries or in general the YAML syntax would break your monitoring system.

“That sounds amazing, show me teh code already”. Our Pipeline is fairly fairly simple:

Gitlab Pipeline

You can automate the last step too, but I just wanted to keep atleast one manual check in this system so I decided against it. And since I don’t have a distributed Prometheus setup yet, it’s simple to keep things the old school way here .

You can do pretty much all of the steps in the above pipeline of any CI of your choice, but I am using Gitlab, so here’s a sample .gitlab-ci.yml file:

stages:
  - lint
  - deploy

prometheus-lint:
  stage: lint # run this job on stage lint
  image: golang:1.11-alpine # pull go1.11 image from official docker repo
  before_script:
    - apk update && apk add git # install git
  script:
    - GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go get -u github.com/prometheus/prometheus/cmd/promtool # fetches promtool package
    - $GOPATH/bin/promtool check rules rules/alerts/* # run the `check rules` command for my prometheus rule files

push-to-s3:
  stage: deploy # run this job on stage deploy
  environment:
    name: production # tag the job metadata in this environment. useful to quickly revert deploys when shit hits the fan
  image: python:3.7-alpine
  script:
    - pip install awscli # adds aws cli tools
    - aws s3 sync prometheus/rules s3://mybucket/rules/ # pushes our rule config files to s3
  only:
    refs: # only allow deploy if branch is master.
      - master
  when: manual # trigger this job manually

A brief explanation for the above file:

Now that we have our basics about the CI/CD pipeline in place, let us see what each of the jobs is actually doing:

Gitlab Environment Variables

🔗Automate everything? Nah mate

This is how the deploy.sh script looks like:

#!/bin/sh
aws s3 sync s3://mybucket/rules/ /etc/prometheus/config/rules/
promtool check config /etc/prometheus/prometheus.yml
sudo service prometheus restart

As much as I’d like to be a cool hipster and run /deploy from my slack bot (last I checked, there’s a legit term for this: ChatOps), I simply don’t prefer that. Having a human intervention before doing critical deployments like this is OK, IMHO. I don’t update the Prometheus config often, so I don’t mind actually SSH-ing into the single instance and triggering a shell script which does the job for me. I also don’t have a distributed Prometheus setup as of yet. Things will definitely change based on your requirements and there’s no one size that fits all.

I’m much more confident now with my config changes and don’t have to pray to the server overlords everytime I restart Prometheus.

It just works .

Tags: #Monitoring #Prometheus #Devops #CI/CD