Logo

YAML Guide

Nobl9 TechDocs: YAML Guide

Back to Nobl9 Documentation

YAML Guide

Nobl9 TechDocs: YAML Guide

This document explains how Nobl9 configuration is represented in the sloctl API, and how you can express them in .yaml format.

Overall Schema

apiVersion: n9/v1alpha
kind: Agent | AlertMethod | AlertPolicy |  DataExport | Direct | Service | SLO |  Project | RoleBinding |
metadata:
  name: string
  displayName: string   # optional
  project: string   # optional
spec:

Notes

Please, note that names should be unique within a project. It is not possible to have two objects of same kind with exactly the same name in a given project.

Caution: If you are using sloctl version is older than 0.0.56, you will not be able to use the kind: Project or kind: RoleBinding.

Service

A Service is a high-level grouping of SLOs.

apiVersion: n9/v1alpha
kind: Service
metadata:
  name: string
  displayName: string  # optional
  project: string
spec:
  description: string  # optional up to 1050 characters
  serviceType: string  # example: WebApp
apiVersion: n9/v1alpha
kind: Service
metadata:
  name: webapp-service
  displayName: Webapp Service
  project: default
spec:
  description: Service to connect to internal notification system
  serviceType: WebApp

SLO

A service level objective (SLO) is a target value or range of values for a service level that is measured by a service level indicator (SLI).

💡 Note: Specific attributes are described in detail in the Notes and under each integration section.

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: string
  displayName: string #optional
  project: string
spec:
  description: string #optional
  service: [service name] #name of the service you defined in the same project as SLO
  indicator:
    metricSource:
      name: [datasource name] #name of the datasource you defined
      project: [project name] #optional if not defined,project is the same as an SLO.
      kind: [resource name] # optional and defaults to "Agent", possible values: Agent, Direct
    rawMetric:
      # exactly one of possible source type depends on kind of metricSource
      prometheus: #add this if using prometheus
        promql: # query, such as “cpu_usage_user{cpu="cpu-total"}”
      # or
      newRelic: #add this if using newrelic
         nrql: # query such as "SELECT average(duration * 1000) FROM Transaction TIMESERIES"
      # or
      datadog: # add this if using datadog
         query: # query such as “sum:ingest.ok{*}"
      # or
      appDynamics:
        applicationName: # AppD application name such as "n9"
        metricPath: # AppD metrics path such as "End User Experience|App|End User Response Time"
      # or
      lightstep:
        streamId:   # ID of a metric stream
        typeOfData: # type enumeration, one of (latency | error_rate | good | total)
        percentile: # numeric, used with "typeOfData: latency"
      # or
      splunk:
        query:     # search with a specific source and latency etc.
        fieldName: # name of numeric field from query results to be used
      # or
      splunkObservability:
        query:     # search with a specific source and latency etc.
  timeWindows:
    # exactly one item, one of possible rolling time window or calendar aligned
    # rolling time window
    - unit: Day | Hour | Minute
      count: numeric
      isRolling: true
    # or
    # calendar aligned time window
    - unit: Year | Quarter | Month | Week | Day
      count: numeric # count of time units for example count: 7 and unit: Day means 7 days window
      calendar:
        startTime: 2020-01-21 12:30:00 # date with time in 24h format, format without time zone
        timeZone: America/New_York # name as in IANA Time Zone Database
      isRolling: false # false or or not defined
  budgetingMethod: Occurrences | Timeslices
  objectives:  # see objectives below for details
  alertPolicies:
    - string # The name of the alert policy associated with this SLO 
             # (alert policy only from the same project as SLO). 
             # Allowed to have 0 to 5 AlertPolicy per SLO.

Notes

SLO using AppDynamics

Examples of SLO using AppDynamics: as threshold SLI (raw metric) and as ratio SLI (count metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: appdynamics-threshold
  displayName: AppDynamics Threshold
  project: appdynamics
spec:
  description: 95th percentile of End User Response 1 Week Calendar
  service: appdynamics-service
  indicator:
    metricSource:
      name: appdynamics-agent
    rawMetric:
      appDynamics:
        applicationName: "myApplication"
        metricPath: "End User Experience|App|End User Response Time 95th percentile (ms)"
  timeWindows:
    - unit: Day
      count: 7
      calendar:
        startTime: 2020-03-09 00:00:00
        timeZone: Europe/Warsaw
  budgetingMethod: Occurrences
  objectives:
    - displayName: Acceptable
      op: lte
      value: 10000
      target: 0.75
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: appdynamics-ratio
  displayName: AppDynamics Ratio
  project: appdynamics
spec:
  description: AppDynamics End User Response 1 Week Calendar
  service: appdynamics-service
  indicator:
    metricSource:
      name: appdynamics-agent
  timeWindows:
    - unit: Day
      count: 7
      calendar:
        startTime: 2020-03-09 00:00:00
        timeZone: Europe/Warsaw
  budgetingMethod: Occurrences
  objectives:
    - displayName: Slow Requests
      value: 10
      target: 0.50
      countMetrics:
        incremental: false
        good:
          appDynamics:
            applicationName: "myApplication"
            metricPath: "End User Experience|App|Slow Requests"
        total:
          appDynamics:
            applicationName: "myApplication"
            metricPath: "End User Experience|App|Normal Requests"
    - displayName: Very Slow Requests
      value: 50
      target: 0.91
      countMetrics:
        incremental: false
        good:
          appDynamics:
            applicationName: "myApplication"
            metricPath: "End User Experience|App|Very Slow Requests"
        total:
          appDynamics:
            applicationName: "myApplication"
            metricPath: "End User Experience|App|Normal Requests"

Metric specification from AppDynamics has 2 fields:

SLO Using Amazon CloudWatch

Using Amazon CloudWatch, users can create their SLOs by:

SLO using Amazon CloudWatch as threshold metric (raw metric) and as ratio metric (count metric):

---
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: cloudwatch-rolling-occurrences-threshold
  project: cloudwatch
spec:
  budgetingMethod: Occurrences
  description: ""
  indicator:
    metricSource:
      name: cloudwatch
    rawMetric:
      cloudwatch:
        region: eu-central-1
        namespace: AWS/RDS
        metricName: ReadLatency
        stat: Average
        dimensions:
          - name: DBInstanceIdentifier
            value: prod-rds-1-inst-a
  service:  cloudwatch-service
  objectives:
    - target: 0.8
      op: lte
      value: 0.0004
  timeWindows:
    - count: 1
      isRolling: true
      unit: Hour
---
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: cloudwatch-rolling-timeslices-threshold
  project: cloudwatch
spec:
  budgetingMethod: Timeslices
  description: ""
  indicator:
    metricSource:
      name: cloudwatch
    rawMetric:
      cloudwatch:
        region: eu-central-1
        namespace: AWS/RDS
        metricName: ReadLatency
        stat: Average
        dimensions:
          - name: DBInstanceIdentifier
            value: prod-rds-1-inst-a
  service:  cloudwatch-service
  objectives:
    - target: 0.8
      op: lte
      value: 0.0004
      timeSliceTarget: 0.5
  timeWindows:
  - count: 1
    isRolling: true
    unit: Hour
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: cloudwatch-calendar-occurrences-ratio
  project: cloudwatch
spec:
  budgetingMethod: Occurrences
  description: ""
  indicator:
    metricSource:
      name: cloudwatch
  service:  cloudwatch-service
  objectives:
  - target: 0.9
    countMetrics:
      good:
        cloudwatch:
          region: eu-central-1
          namespace: AWS/ApplicationELB
          metricName: HTTPCode_Target_2XX_Count
          stat: SampleCount
          dimensions:
            - name: LoadBalancer
              value: app/prod-default-appingress
      incremental: false
      total:
        cloudwatch:
          region: eu-central-1
          namespace: AWS/ApplicationELB
          metricName: RequestCount
          stat: SampleCount
          dimensions:
            - name: LoadBalancer
              value: app/prod-default-appingress
    displayName: ""
    value: 1
  timeWindows:
  - calendar:
      startTime: "2020-11-14 12:30:00"
      timeZone: Etc/UTC
    count: 1
    isRolling: false
    unit: Day
---
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: cloudwatch-rolling-occurrences-ratio
  project: cloudwatch
spec:
  budgetingMethod: Occurrences
  description: ""
  indicator:
    metricSource:
      name: cloudwatch
  service:  cloudwatch-service
  objectives:
  - target: 0.7
    countMetrics:
      good:
        cloudwatch:
          region: eu-central-1
          namespace: AWS/ApplicationELB
          metricName: HTTPCode_Target_2XX_Count
          stat: SampleCount
          dimensions:
            - name: LoadBalancer
              value: app/prod-default-appingress
      incremental: false
      total:
        cloudwatch:
          region: eu-central-1
          namespace: AWS/ApplicationELB
          metricName: RequestCount
          stat: SampleCount
          dimensions:
            - name: LoadBalancer
              value: app/prod-default-appingress
    displayName: ""
    value: 1
  timeWindows:
  - count: 1
    isRolling: true
    unit: Hour
---
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: cloudwatch-calendar-timeslices-ratio
  project: cloudwatch
spec:
  budgetingMethod: Timeslices
  description: ""
  indicator:
    metricSource:
      name: cloudwatch
  service:  cloudwatch-service
  objectives:
  - target: 0.5
    countMetrics:
      good:
        cloudwatch:
          region: eu-central-1
          namespace: AWS/ApplicationELB
          metricName: HTTPCode_Target_2XX_Count
          stat: SampleCount
          dimensions:
            - name: LoadBalancer
              value: app/main-default-appingress
      incremental: false
      total:
        cloudwatch:
          region: eu-central-1
          namespace: AWS/ApplicationELB
          metricName: RequestCount
          stat: SampleCount
          dimensions:
            - name: LoadBalancer
              value: app/main-default-appingress
    displayName: ""
    timeSliceTarget: 0.5
    value: 1
  timeWindows:
  - calendar:
      startTime: "2020-11-14 12:30:00"
      timeZone: Etc/UTC
    count: 1
    isRolling: false
    unit: Day
---
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: cloudwatch-rolling-timeslices-ratio
  project: cloudwatch
spec:
  budgetingMethod: Timeslices
  description: ""
  indicator:
    metricSource:
      name: cloudwatch
  service:  cloudwatch-service
  objectives:
  - target: 0.5
    countMetrics:
      good:
        cloudwatch:
          region: eu-central-1
          namespace: AWS/ApplicationELB
          metricName: HTTPCode_Target_2XX_Count
          stat: SampleCount
          dimensions:
            - name: LoadBalancer
              value: app/main-default-appingress
      incremental: false
      total:
        cloudwatch:
          region: eu-central-1
          namespace: AWS/ApplicationELB
          metricName: RequestCount
          stat: SampleCount
          dimensions:
            - name: LoadBalancer
              value: app/main-default-appingress
    timeSliceTarget: 0.5
    value: 1
  timeWindows:
  - count: 1
    isRolling: true
    unit: Hour


💡 Important Notes:

Nobl9 currently supports only CloudWatch metrics feature. Queries to AWS CloudWatch use a period of 60 seconds.

Both, Ratio and Threshold metrics for CloudWatch use the same parameters (in case of the Ratio metric, define these parameters separately for Good Metric and Total metric).

CloudWatch SLOs using SQL query:

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: cloudwatch-occurrences-threshold-via-sql
  project: cloudwatch
spec:
  budgetingMethod: Occurrences
  indicator:
    metricSource:
      name: cloudwatch
    rawMetric:
      cloudwatch:
        region: eu-central-1
        sql: 'SELECT AVG(CPUUtilization) FROM "AWS/EC2"'
  service: cloudwatch-service
  objectives:
    - target: 0.8
      op: lte
      value: 0.0004
  timeWindows:
    - calendar:
        startTime: "2021-10-01 12:30:00"
        timeZone: Etc/UTC
      count: 1
      isRolling: false
      unit: Day
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: cloudwatch-calendar-occurrences-ratio
  project: cloudwatch
spec:
  budgetingMethod: Occurrences
  description: ""
  indicator:
    metricSource:
      name: cloudwatch
  service:  cloudwatch-service
  objectives:
  - target: 0.9
    countMetrics:
      good:
        cloudwatch:
          region: eu-central-1
          sql: 'SELECT AVG(CPUUtilization) FROM "AWS/EC2"'
      incremental: false
      total:
        cloudwatch:
          region: eu-central-1
          sql: 'SELECT MAX(CPUUtilization) FROM "AWS/EC2"'
    displayName: ""
    value: 1
  timeWindows:
  - calendar:
      startTime: "2020-11-14 12:30:00"
      timeZone: Etc/UTC
    count: 1
    isRolling: false
    unit: Day


Important notes: When using SQL Query only these fields are required:

CloudWatch SLOs using multiple metrics

CloudWatch integration enables you to query multiple CloudWatch metrics and use math expressions to create new time series based on these metrics. You can do this by entering Multiple JSON Queries:

apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: cloudwatch-rolling-multiple-metrics-metricstat
  spec:
    budgetingMethod: Timeslices
    description: ""
    indicator:
      metricSource:
        name: cloudwatch
    objectives:
    - countMetrics:
        good:
          cloudWatch:
            json: |
              [
                {
                  "Id": "e1",
                  "MetricStat": {
                    "Metric": {
                      "Namespace": "AWS/ApplicationELB",
                      "MetricName": "HTTPCode_Target_2XX_Count",
                      "Dimensions": [
                        {
                          "Name": "LoadBalancer",
                          "Value": "app/main-default-appingress-350b/123456789"
                        }
                      ]
                    },
                    "Period": 60,
                    "Stat": "SampleCount"
                  }
                }
              ]
            region: eu-central-1
        incremental: false
        total:
          cloudWatch:
            json: |
              [
                {
                  "Id": "e2",
                  "MetricStat": {
                    "Metric": {
                      "Namespace": "AWS/ApplicationELB",
                      "MetricName": "RequestCount",
                      "Dimensions": [
                        {
                          "Name": "LoadBalancer",
                          "Value": "app/main-default-appingress-350b/123456789"
                        }
                      ]
                    },
                    "Period": 60,
                    "Stat": "SampleCount"
                  }
                }
              ]
            region: eu-central-1
      displayName: ""
      target: 0.5
      timeSliceTarget: 0.5
      value: 1
    service: cloudwatch
    timeWindows:
    - count: 1
      isRolling: true
      period:
        begin: "2021-11-10T12:19:58Z"
        end: "2021-11-10T13:19:58Z"
      unit: Hour
 apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: cloudwatch-rolling-window-json
spec:
  budgetingMethod: Occurrences
  description: ""
  indicator:
    metricSource:
      name: cloudwatch
    rawMetric:
      cloudwatch:
        region: eu-central-1
        json: |
            [
                {
                    "Id": "e1",
                    "Expression": "m1 / m2",
                    "Period": 60
                },
                {
                    "Id": "m1",
                    "MetricStat": {
                        "Metric": {
                            "Namespace": "AWS/ApplicationELB",
                            "MetricName": "HTTPCode_Target_2XX_Count",
                            "Dimensions": [
                                {
                                    "Name": "LoadBalancer",
                                    "Value": "app/main-default-appingress-350b/904311bedb964754"
                                }
                            ]
                        },
                        "Period": 60,
                        "Stat": "SampleCount"
                    },
                    "ReturnData": false
                },
                {
                    "Id": "m2",
                    "MetricStat": {
                        "Metric": {
                            "Namespace": "AWS/ApplicationELB",
                            "MetricName": "RequestCount",
                            "Dimensions": [
                                {
                                    "Name": "LoadBalancer",
                                    "Value": "app/main-default-appingress-350b/904311bedb964754"
                                }
                            ]
                        },
                        "Period": 60,
                        "Stat": "SampleCount"
                    },
                    "ReturnData": false
                }
            ]
  service: cloudwatch-service
  objectives:
    - target: 0.8
      op: lte
      value: 0.9
  timeWindows:
   -  isRolling: true
      unit: Hour
      count: 1


Important Notes:

When using Multiple Queries (JSON) it is important to remember about:

The following JSON validation applies:

For further details on CloudWatch metric math functions, go to Using Metric Math.

❗❗Caution. The following CloudWatch metric features are not supported:

SLO Using Amazon Managed Service for Prometheus

# Amazon Prometheus as threshold metric (raw metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  displayName: My Amazon Prometheus SLO
  name: my-amazon-prometheus-slo
  project: my-amazon-prmotheus-project
spec:
  budgetingMethod: Occurrences
  description: ""
  indicator:
    metricSource:
    name: my-amazon-prometheus-source
    rawMetric:
       prometheus:
         promql: myapp_server_requestMsec{host="*",job="nginx"}
  service: my-service
  objectives:
  - target: 0.8
    op: lte
    displayName: average
    value: 200
  - target: 0.5
    op: lte
    displayName: so-so
    value: 150
  timeWindows:
    - calendar:
        startTime: "2020-11-14 11:00:00"
        timeZone: Etc/UTC
      count: 1
      isRolling: false
      unit: Day
# Amazon Prometheus as ratio metric (count metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  displayName: amazon-prometheus-calendar-timeslices-ratio
  name: amazon-prometheus-calendar-timeslices-ratio
  project: my-amazon-prometheus
spec:
  budgetingMethod: Timeslices
  description: ""
  indicator:
    metricSource:
      name: amazon-prometheus
  service: amazon-prometheus-service
  objectives:
    - target: 0.75
      countMetrics:
        good:
          prometheus:
            promql: sum(production_http_response_time_seconds_hist_bucket{method=~"GET|POST",status=~"2..|3..",le="1"})
        incremental: true
        total:
          prometheus:
            promql: sum(production_http_response_time_seconds_hist_bucket{method=~"GET|POST",le="+Inf"})
      displayName: available1
      timeSliceTarget: 0.75
      value: 1
  timeWindows:
   - calendar:
       startTime: "2020-11-14 11:00:00"
       timeZone: Etc/UTC
     count: 1
     isRolling: false
     unit: Day

Specification for metric from Amazon Managed Service for Prometheus has one mandatory field:

SLO using BigQuery

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: bigquery-test
  project: default
spec:
  service: bq-service
  indicator:
    metricSource:
      name: bigquery
  timeWindows:
    - unit: Day
      count: 7
      calendar:
        startTime: 2020-03-09 00:00:00
        timeZone: Europe/Warsaw
  budgetingMethod: Occurrences
  objectives:
    - displayName: Good
      target: 0.95
      countMetrics:
        incremental: false
        good:
          bigQuery:
            projectId: "test-123"
            location: "EU"
            query: "SELECT response_time AS n9value, created AS n9date FROM `test-123.metrics.http_response`"
        total:
          bigQuery:
            projectId: "test-123"
            location: "EU"
            query: "SELECT response_time AS n9value, created AS n9date FROM `test-123.metrics.http_response`"

The BigQuery SLO requires the following three fields:

SLO using Datadog

Examples of SLO using Datadog: as threshold metric (raw metric) and as ratio metric (count metric):

# Datadog as threshold metric (raw metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  displayName: My SLO
  name: my-datadog-slo
  project: my-project
spec:
  budgetingMethod: Timeslices
  description: ""
  indicator:
    metricSource:
      name: datadog
    rawMetric:
      datadog:
        query: avg:trace.postgres.query.duration{*}
  service: my-service
  objectives:
  - target: 0.6
    op: lte
    displayName: rather-bad
    timeSliceTarget: 0.6
    value: 0.003
  - target: 0.99
    op: lte
    displayName: stretched
    timeSliceTarget: 0.99
    value: 0.004
  timeWindows:
  - count: 1
    isRolling: true
    unit: Hour
# Datadog as ratio metric (count metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  displayName: My SLO
  name: my-datadog-slo
  project: my-project
spec:
  budgetingMethod: Occurrences
  description: ""
  indicator:
    metricSource:
      name: datadog
  service: my-service
  objectives:
  - target: 0.7
    countMetrics:
      good:
        datadog:
          query: sum:trace.http.request.hits.by_http_status{http.status_class:2xx}.as_count()
      incremental: false
      total:
        datadog:
          query: sum:trace.http.request.hits.by_http_status{*}.as_count()
    displayName: available1
    value: 1
  timeWindows:
  - count: 1
    isRolling: true
    unit: Hour

Metric queries in Datadog are described under the following link: Querying metrics

💡 Note: It is important to define queries in such a way that they return only one time series.

Example:

❌ Grouping metrics will often result in a multiple time series:

✔ Same query without grouping

💡 Note: It is strongly suggested to not use .rollup() or .moving_rollup() functions in your queries (see Rollup).

Nobl9 agent uses enforced rollup described under Rollup interval: enforced vs custom to control the number of points returned from the queries. Using .rollup() or .moving_rollup() can affect the number of returned points or the way they are aggregated. This fact in conjunction with the time range of each query Nobl9 agent makes can skew calculated error budgets.

SLO using Dynatrace

Examples of SLO using Dynatrace: as threshold SLI (raw metric) and as ratio SLI (count metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: dynatrace-slo-raw
  displayName: Dynatrace Raw
  project: dynatrace
spec:
  service: dynatrace-demo-service
  indicator: 
    metricSource: 
      name: dynatrace
    rawMetric: 
      dynatrace: 
        metricSelector: >
            builtin:synthetic.http.duration.geo:filter(
            and(in("dt.entity.http_check",entitySelector("type(http_check),
            entityName(~"API Sample~")")),in("dt.entity.synthetic_location",
            entitySelector("type(synthetic_location),entityName(~"N. California~")")))):
            splitBy("dt.entity.http_check","dt.entity.synthetic_location"):
            avg:auto:sort(value(avg,descending)):limit(20)
  timeWindows: 
    - unit: Day
      count: 1
      calendar: 
        startTime: 2020-01-21 12:30:00
        timeZone: America/New_York
  budgetingMethod: Occurrences
  objectives: 
    - displayName: Excellent
      value: 200
      target: 0.8
      op: lte
    - displayName: Good
      value: 250
      target: 0.9
      op: lte
    - displayName: Poor
      value: 300
      target: 0.99
      op: lte
apiVersion: n9/v1alpha
kind: SLO
metadata: 
  name: dynatrace-slo-ratio
  displayName: Dynatrace Ratio
  project: dynatrace
spec: 
  budgetingMethod: Occurrences
  indicator: 
    metricSource: 
      kind: Agent
      name: dynatrace
      project: dynatrace
  objectives: 
    - countMetrics: 
        good: 
          dynatrace: 
            metricSelector: >
                builtin:synthetic.http.request."statusCode":filter(
                and(in("dt.entity.http_check_step",entitySelector("type(http_check_step),
                entityName(~"httpbin.org~")")),eq("Status code",SC_2xx),
                in("dt.entity.synthetic_location",entitySelector("type(synthetic_location),
                entityName(~"N. California~")")))):
                splitBy("Status code","dt.entity.http_check_step","dt.entity.synthetic_location"):
                count:auto:sort(value(avg,descending)):limit(20)
        incremental: false
        total: 
          dynatrace: 
            metricSelector: >
                builtin:synthetic.http.request."statusCode":filter(
                and(in("dt.entity.http_check_step",entitySelector("type(http_check_step),
                entityName(~"httpbin.org~")")),in("dt.entity.synthetic_location",
                entitySelector("type(synthetic_location),entityName(~"N. California~")")))):
                splitBy("dt.entity.synthetic_location","dt.entity.http_check_step"):
                count:auto:sort(value(avg,descending)):limit(20)
      displayName: Enough
      target: 0.5
      value: 1
  service: dynatrace-demo-service
  timeWindows: 
    - count: 1
      isRolling: true
      period: 
        begin: "2021-05-05T10:39:55Z"
        end: "2021-05-05T11:39:55Z"
      unit: Hour

Metric specification from Dynatrace has 1 field:

metricSelector can be obtained from the Dynatrace v2 API. In the Custom chart area, select Try it out button. Then in the Data explorer, select the Code tab.

SLO using Elasticsearch

Examples of SLO using Elasticsearch as threshold (raw metric) and a ratio (count metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: elasticsearch-slo-raw
  displayName: Elastic Search Raw
  project: elastic
spec:
  service: elasticsearch-demo-service
  indicator:
    metricSource:
      name: elastic
    rawMetric:
      elasticsearch:
        query: |
          {
              "query": {
                  "bool": {
                      "must": [
                          {
                              "match": {
                                  "service.name": "weloveourpets_xyz"
                              }
                          }
                      ],
                      "filter": [
                          {
                              "range": {
                                  "@timestamp": {
                                      "gte": "{{.BeginTime}}",
                                      "lte": "{{.EndTime}}"
                                  }
                              }
                          }
                      ]
                  }
              },
              "size": 0,
              "aggs": {
                  "resolution": {
                      "date_histogram": {
                          "field": "@timestamp",
                          "fixed_interval": "{{.Resolution}}",
                          "min_doc_count": 0,
                          "extended_bounds": {
                              "min": "{{.BeginTime}}",
                              "max": "{{.EndTime}}"
                          }
                      },
                      "aggs": {
                          "n9-val": {
                              "avg": {
                                  "field": "transaction.duration.us"
                              }
                          }
                      }
                  }
              }
          }
        index: apm-7.13.3-transaction
  timeWindows:
    - unit: Day
      count: 1
      calendar:
        startTime: 2020-01-21 12:30:00
        timeZone: America/New_York
  budgetingMethod: Occurrences
  objectives:
    - displayName: Excellent
      value: 200
      target: 0.8
      op: lte
    - displayName: Good
      value: 250
      target: 0.9
      op: lte
    - displayName: Poor
      value: 300
      target: 0.99
      op: lte
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: elasticsearch-slo-ratio
  displayName: Elastic Search Ratio
  project: elastic
spec:
  budgetingMethod: Occurrences
  indicator:
    metricSource:
      kind: Agent
      name: elastic
      project: elastic
  objectives:
    - countMetrics:
        good:
          elasticsearch:
            query: |
              {
                  "query": {
                      "bool": {
                          "must": [
                              {
                                  "match": {
                                      "service.name": "weloveourpets_xyz"
                                  }
                              }
                          ],
                          "filter": [
                              {
                                  "range": {
                                      "@timestamp": {
                                          "gte": "{{.BeginTime}}",
                                          "lte": "{{.EndTime}}"
                                      }
                                  }
                              },
                              {
                                  "match": {
                                      "transaction.result": "HTTP 2xx"
                                  }
                              }
                          ]
                      }
                  },
                  "size": 0,
                  "aggs": {
                      "resolution": {
                          "date_histogram": {
                              "field": "@timestamp",
                              "fixed_interval": "{{.Resolution}}",
                              "min_doc_count": 0,
                              "extended_bounds": {
                                  "min": "{{.BeginTime}}",
                                  "max": "{{.EndTime}}""
                              }
                          },
                          "aggs": {
                              "n9-val": {
                                  "value_count": {
                                      "field": "transaction.result"
                                  }
                              }
                          }
                      }
                  }
              }
            index: apm-7.13.3-transaction
        incremental: false
        total:
          elasticsearch:
            query: |
              {
                  "query": {
                      "bool": {
                          "must": [
                              {
                                  "match": {
                                      "service.name": "weloveourpets_xyz"
                                  }
                              }
                          ],
                          "filter": [
                              {
                                  "range": {
                                      "@timestamp": {
                                          "gte": "{{.BeginTime}}",
                                          "lte": "{{.EndTime}}"
                                      }
                                  }
                              }
                          ]
                      }
                  },
                  "size": 0,
                  "aggs": {
                      "resolution": {
                          "date_histogram": {
                              "field": "@timestamp",
                              "fixed_interval": "{{.Resolution}}"
                              "min_doc_count": 0,
                              "extended_bounds": {
                                  "min": "{{.BeginTime}}",
                                  "max": "{{.EndTime}}"
                              }
                          },
                          "aggs": {
                              "n9-val": {
                                  "value_count": {
                                      "field": "transaction.result"
                                  }
                              }
                          }
                      }
                  }
              }
            index: apm-7.13.3-transaction
      displayName: Enough
      target: 0.5
      value: 1
  service: dynatrace-demo-service
  timeWindows:
    - count: 1
      isRolling: true
      unit: Hour

When data from Elastic APM is used, @timestamp is an example of a field that holds the timestamp of the document. Another field can be utilized according to the schema used.

The following are mandatory placeholders {{.BeginTime}}, {{.EndTime}} and are replaced by the Nobl9 agent with the correct time range values.

Use the following links in the Elasticsearch guides for context:

The Nobl9 agent requires that the search result be a time series. The agent expects the date_histogram aggregation is named resolution will be used, and will be source of the timestamps with child aggregation named n9-val, which is the source of the value(s).

{
  "aggs": {
        "resolution": {
            "date_histogram": {
                "field": "@timestamp",
                "fixed_interval": "{{.Resolution}}",
                "min_doc_count": 0,
                "extended_bounds": {
                    "min": "{{.BeginTime}}",
                    "max": "{{.EndTime}}"
                }
            },
            "aggs": {
                "n9-val": {
                    "avg": {
                        "field": "transaction.duration.us"
                    }
                }
            }
        }
    }
}
  1. Date Histogram Aggregation
    • The recommendation is to use fixed_interval with date_histogram and pass {{.Resolution}} placeholder as the value. This will enable Nobl9 agent to control data resolution.
    • The query must not use a fixed_interval longer than 1 minute because queries are done every 1 minute for a 1-minute time range.
  2. Date Histogram Aggregation Fixed Intervals
    • The "field": "@timestamp" must match field used in the filter query.
    • Using extended_bounds is recommended with the same placeholders "{{.BeginTime}}", "{{.Endime}}" as a filter query.
  3. Metrics Aggregations
    • The n9-val must be a metric aggregation.
    • The single value metric aggregation value is used as the value of the time series.
    • The multi-value metric aggregation first returns a non-null value and is used as value of the time series.

In the following example null values are skipped.

"aggs": {
    "n9-val": {
        ...
    }
} 
  1. The elasticsearch.index is index name when the query completes.

SLO using Grafana Loki

Example of SLO using Loki as threshold metric (raw metric):

# Loki as threshold metric (raw metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: n9-kafka-main-cluster-alerts-error-budgets-out-lag-threshold
  project: grafana-loki
spec:
  description: Example of Loki Metric query
  service: grafana-loki-service
  indicator:
    metricSource:
      name: grafana-loki
    rawMetric:
      grafanaLoki:
        logql: sum(sum_over_time({topic="error-budgets-out", consumergroup="alerts", cluster="main"} |= "kafka_consumergroup_lag" | logfmt | line_format "{{.kafka_consumergroup_lag}}" | unwrap kafka_consumergroup_lag [1m]))
  timeWindows:
    - unit: Day
      count: 1
      isRolling: true
  budgetingMethod: Occurrences
  objectives:
    - displayName: Good
      op: lte
      value: 5
      target: 0.50
    - displayName: Moderate
      op: lte
      value: 10
      target: 0.75

SLO using Graphite

The schema and example of SLO using Graphite as threshold metric:

# Graphite as threshold metric:

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: string
  displayName: string # optional
  project: string
spec:
  description: string     # optional
  service: [service name] # name of the service you defined in the same project as SLO
  indicator:
    metricSource:
      name: [datasource name] # name of the datasource you defined
      project: [project name] # optional if not defined,project is the same as an SLO.
      kind: [resource name]   # optional and defaults to "Agent", possible values: Agent, Direct
    rawMetric:
      # exactly one of possible source types, which depends on selected metricSource for SLO
      graphite: # add this if using prometheus
        metricPath: string # metric path, such as "servers.cpu.total"
# Graphite as threshold metric:

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: graphite-slo-1
  project: graphite
spec:
  service: web-service
  indicator:
    metricSource:
      name: graphite-agent
    rawMetric:
      graphite:
        metricPath: carbon.agents.9b365cce.cpuUsage
  timeWindows:
    - unit: Day
      count: 7      
      isRolling: true  
  budgetingMethod: Occurrences  
  objectives:
    - displayName: Good      
      op: lte      
      value: 100      
      target: 0.9

The schema and example of SLO using Graphite as a ratio metric:

# Graphite as ratio metric:

objectives:
  - displayName: string # optional
    op: lte | gte | lt | gt # operator is allowed comparing method for labeling SLI
    value: numeric # value used to compare with raw metrics values. All objectives of the SLO need to have unique value.
    target: numeric [0.0, 1.0)          # budget target for given objective of the SLO
    timeSliceTarget: numeric (0.0, 1.0] # required only when budgetingMethod is set to TimeSlices
    # countMetrics {good, total} should be defined only if raw metric is not set. 
    # If rawMetric is defined on SLO level, raw data received from a metric source is compared with objective value. 
    # Count metrics good and total have to contain the same source type configuration (for example for prometheus).
    countMetrics:
        # true for monotonically increasing counters, false for metrics that can arbitrarily go up and down
        incremental: true | false
        good:
      # exactly one of possible source types, which depends on selected metricSource for SLO
          graphite: #add this if using prometheus
            metricPath: string # metric path, such as "servers.cpu.total"
        total:
    # exactly one of possible types, which depends on selected metricSource for SLO
        graphite: # add this if using prometheus
          metricPath: string  # metric path, such as "servers.cpu.total"
Metric specification from Graphite comes with only one field:

# Graphite as a ratio metric:

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: graphite-slo-2
  project: default
spec:
  service: web-service
  indicator:
    metricSource:
      name: graphite-agent
  timeWIndows: 
    - unit: Day
      count: 7
      calendar:
        startTime: 2020-03-09 00:00:00
        timeZone: Europe/Warsaw
  budgetingMethod: Occurrences
  objectives:
    - displayName: Good
      target: 0.95
      countMetrics:
        incremental: false
        good:
          graphite:
            metricPath: stats_counts.response.200
        total:
          graphite:
            metricPath: stats_counts.response.all

SLO using Lightstep

Examples of SLO using Lightstep: as threshold metric (raw metric) and as ratio metric (count metric):

# Lightstep as threshold metric (raw metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: get-store-p95-latency-rolling
  project: lightstep
spec:
  service: android-service
  indicator:
    metricSource:
      name: lightstep
    rawMetric:
      lightstep:
        streamID: DzpxcSRh
        typeOfData: latency
        percentile: 95
  timeWindows:
    - unit: Day
      count: 7
      isRolling: true
  budgetingMethod: Occurrences
  objectives:
    - displayName: Good
      op: lte
      value: 150
      target: 0.50
    - displayName: Moderate
      op: lte
      value: 175
      target: 0.75
    - displayName: Annoying
      op: lte
      value: 200
      target: 0.95
# Lightstep as ratio metric (count metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: get-store-counts-calendar
  project: lightstep
spec:
  service: android-service
  indicator:
    metricSource:
      name: lightstep
  timeWindows:
    - unit: Day
      count: 7
      calendar:
        startTime: 2020-03-09 00:00:00
        timeZone: Europe/Warsaw
  budgetingMethod: Occurrences
  objectives:
    - displayName: Good
      target: 0.95
      countMetrics:
        incremental: false
        good:
          lightstep:
            streamID: DzpxcSRh
            typeOfData: good
        total:
          lightstep:
            streamID: DzpxcSRh
            typeOfData: total

When Lightstep is used as ratio (count) metric, then the field incremental under spec.objectives.countMetrics must be set to false.

Metric specification from Lightstep has 3 fields:

Examples:

lightstep:
  streamID: DzpxcSRh
  typeOfData: latency
  percentile: 95
lightstep:
  streamID: DzpxcSRh
  typeOfData: error_rate
lightstep:
  streamID: DzpxcSRh
  typeOfData: good
lightstep:
  streamID: DzpxcSRh
  typeOfData: total

Lightstep rate limits

Lightstep has very small rate limits for it’s Streams Timeseries API. For Community, Professional and Enterprise licenses it’s 60, 200, 600 requests per hour respectively (https://api-docs.lightstep.com/reference#rate-limits). The Nobl9 agent makes requests once every 60s, which allows for one Lightstep organization to use only 1, 3 or 10 unique metric specs. Lightstep users can request increase of rate limits via Lightstep’s customer support: https://docs.lightstep.com/docs/get-support-from-customer-success.

SLO using New Relic

Examples of SLO using New Relic: as threshold metric (raw metric) and as ratio metric (count metric):

# New Relic as threshold metric (raw metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  displayName: My SLO
  name: my-slo
  project: my-project
spec:
  budgetingMethod: Timeslices
  description: ""
  indicator:
    metricSource:
      name: my-newrelic-source
    rawMetric:
      newRelic:
        nrql: SELECT average(duration) FROM Transaction TIMESERIES
  service: my-service
  objectives:
  - target: 0.6
    op: lte
    displayName: rather bad
    timeSliceTarget: 0.6
    value: 0.6
  - target: 0.99
    op: lte
    displayName: stretched
    timeSliceTarget: 0.99
    value: 1.2
  timeWindows:
  - count: 1
    isRolling: true
    unit: Hour
# New Relic as ratio metric (count metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  displayName: My SLO
  name: my-slo
  project: my-project
spec:
  budgetingMethod: Occurrences
  description: ""
  indicator:
    metricSource:
      name: my-newrelic-source
  service: my-service
  objectives:
  - target: 0.9
    countMetrics:
      good:
        newRelic:
          nrql: SELECT count(*) FROM Transaction WHERE httpResponseCode IN ('200','301','302') TIMESERIES
      incremental: false
      total:
        newRelic:
          nrql: SELECT count(*) FROM Transaction TIMESERIES
    value: 0
  timeWindows:
  - calendar:
      startTime: "2020-11-14 12:10:00"
      timeZone: Etc/UTC
    count: 1
    isRolling: false
    unit: Day

Specification for metric from New Relic always has one mandatory field:

Details about a NewRelic Query Language can be found here:

TIMESERIES clause

Nobl9 always queries for timeseries data and TIMESERIES clause is enforced. Nobl9 adds missing TIMESERIES clause or overwrites existing one.

You can add empty TIMESERIES clause in your NRQL queries or skip it altogether.

💡 Note: Because TIMESERIES clause is overwritten, user should not use their own parameters in it.

1 hour will be overwritten by the agent and set to a different value.

SINCE, UNTIL clauses

💡 Note: Nobl9 agent needs control over a time range queried, so the user must not use SINCE and UNTIL clauses explicitly. There is no mechanism that would overwrite these clauses if used, so using them will cause incorrect results.

One point in results

Queries must be written in such a way that they return only one timeseries. In other words user always has to query for a single attribute or use a single function (or expression).

Examples:

❌ Incorrect queries:

✔ Correct queries:

Other clauses mutually exclusive with TIMESERIES

❌ Other clauses that are mutually exclusive with TIMESERIES clause must not be used.
Example:

SLO using OpenTSDB

Examples of SLO using OpenTSDB: as threshold metric (raw metric) and as ratio metric (count metric):

#OpenTSDB as threshold metric (raw metric)

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: opentsdb-slo-raw
  displayName: OpenTSDB Raw
  project: opentsdb
spec:
  service: opentsdb-demo-service
  indicator:
    metricSource:
      name: opentsdb
    rawMetric:
      opentsdb: >-
        start="{{.BeginTime}}"&end="{{.EndTime}}"&ms=true&m=none:"{{.Resolution}}" -avg-zero:transaction.duration{host=host.01}
  timeWindows:
    - unit: Day
      count: 1
      calendar:
        startTime: 2020-01-21T12:30:00.000Z
        timeZone: America/New_York
  budgetingMethod: Occurrences
  objectives:
    - displayName: Excellent
      value: 200
      target: 0.8
      op: lte
    - displayName: Good
      value: 250
      target: 0.9
      op: lte
    - displayName: Poor
      value: 300
      target: 0.99
  op: lte
#OpenTSDB as ratio metric (count metric)

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: opentsdb-slo-ratio
  displayName: OpenTSDB Ratio
  project: opentsdb
spec:
  budgetingMethod: Occurrences
  indicator:
    metricSource:
      kind: Agent
      name: opentsdb
      project: opentsdb
  objectives:
    - countMetrics:
        good:
          opentsdb: >-
            start="{{.BeginTime}}"&end="{{.EndTime}}"&ms=true&m=none:"{{.Resolution}}"-count-zero:http.code{code=2xx}
        incremental: false
        total:
          opentsdb: >-
            start="{{.BeginTime}}"&end="{{.EndTime}}"&ms=true&m=none:"{{.Resolution}}"-count-zero:http.code{type=http.status_code}
      displayName: Enough
      target: 0.5
      value: 1
  service: opentsdb-demo-service
  timeWindows:
    - count: 1
      isRolling: true
      unit: Hour

OpenTSDB queries

start="{{.BeginTime}}"&end="{{.EndTime}}"}&m=none:"{{.ResolutionTime}}"-p75:test.to.test{tag.name_1=tag.tag_1} parameters in:

References

SLO Using Pingdom

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: my-pingdom-check-raw
  project: pingdom-test
spec:
  description: Example of Pingdom Metric query
  service: pingdom-service
  indicator:
    metricSource:
      name: pingdom
    rawMetric:
      pingdom:
        checkID: 8745322
        status: up

Important Notes:


apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: my-pingdom-check-count-metrics
  project: pingdom-test
spec:
  description: Example of Pingdom count metrics query
  service: pingdom-service
  indicator:
    metricSource:
      kind: Agent
      name: pingdom
      project: pingdom-test
  timeWindows:
    - unit: Day
      count: 28
      isRolling: true
  budgetingMethod: Occurrences
  objectives:
    - displayName: Good
      value: 1
      target: 0.99
      countMetrics:
        incremental: false
        good:
          pingdom:
            checkId: 8745322
            status: up
        total:
          pingdom:
            checkId: 8745322
            status: up,down

Important notes:

SLO using Prometheus

Examples of SLO using Prometheus as threshold metric (raw metric) and as ratio metric (count metric):

# Prometheus as threshold metric (raw metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  displayName: My SLO
  name: my-slo
  project: my-project
spec:
  budgetingMethod: Occurrences
  description: ""
  indicator:
    metricSource:
    name: my-prometheus-source
    rawMetric:
       prometheus:
         promql: myapp_server_requestMsec{host="*",job="nginx"}
  service: my-service
  objectives:
  - target: 0.8
    op: lte
    displayName: average
    value: 200
  - target: 0.5
    op: lte
    displayName: so-so
    value: 150
  timeWindows:
    - calendar:
        startTime: "2020-11-14 11:00:00"
        timeZone: Etc/UTC
      count: 1
      isRolling: false
      unit: Day
# Prometheus as ratio metric (count metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  displayName: prometheus-calendar-timeslices-ratio
  name: prometheus-calendar-timeslices-ratio
  project: prometheus
spec:
  budgetingMethod: Timeslices
  description: ""
  indicator:
    metricSource:
      name: prometheus
  service: prometheus-polakpotrafipl
  objectives:
    - target: 0.75
      countMetrics:
        good:
          prometheus:
            promql: sum(production_http_response_time_seconds_hist_bucket{method=~"GET|POST",status=~"2..|3..",le="1"})
        incremental: true
        total:
          prometheus:
            promql: sum(production_http_response_time_seconds_hist_bucket{method=~"GET|POST",le="+Inf"})
      displayName: available1
      timeSliceTarget: 0.75
      value: 1
  timeWindows:
   - calendar:
       startTime: "2020-11-14 11:00:00"
       timeZone: Etc/UTC
     count: 1
     isRolling: false
     unit: Day

Specification for metric from Prometheus always has one mandatory field:

SLO using Splunk

Examples of SLO using Splunk as threshold metric (raw metric) and as ratio metric (count metric):

# Splunk as threshold metric (raw metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: bing-raw-calendar
  project: splunk
spec:
  service: devlab
  indicator:
    metricSource:
      name: splunk
    rawMetric:
      splunk:
        query: search source="myapp" latency_p95="*"
        fieldName: latency_p95
  timeWindows:
    - unit: Day
      count: 7
      calendar:
        startTime: 2020-03-09 00:00:00
        timeZone: Europe/Warsaw
  budgetingMethod: Occurrences
  objectives:
    - displayName: Good
      op: lte
      value: 0.25
      target: 0.50
    - displayName: Moderate
      op: lte
      value: 0.5
      target: 0.75
    - displayName: Annoying
      op: lte
      value: 1.0
      target: 0.95
# Splunk as ratio metric (count metric):

apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: bing-raw-calendar
  project: splunk
spec:
  service: polakpotrafi
  indicator:
    metricSource:
      name: splunk
  timeWindows:
    - unit: Day
      count: 7
      calendar:
        startTime: 2020-03-09 00:00:00
        timeZone: Europe/Warsaw
  budgetingMethod: Occurrences
  objectives:
    - displayName: Good
      op: lte
      value: 0.25
      target: 0.50
      countMetrics:
        incremental: true
        good:
          splunk:
            query: search source="myapp" good_count="*"
            fieldName: good_count
        total:
          splunk:
            query: search source="myapp" total_count="*"
            fieldName: total_count

Example of Splunk’s metric specification:

splunk:
  query: search source="myapp" latency_p95="*"
  fieldName: latency_p95

Specification for metric from Splunk always has 2 mandatory fields:

Number of events returned from Splunk queries

Supported search SPL command searches within indexed events. The total number of events can be large, and a query without specific conditions, such as search sourcetype=*, returns all indexed events. A large number of data points sent to Nobl9 could disrupt the system performance. Therefore, there is a hard limit of 4 events per minute.

Event rate limit is enforced by agent in the following way:

  1. If the number of points in a given minute if less than or equal to 4 than all points will be sent.

  2. If the number of points is greater than or 4 then total number of points minus 4 will be dropped and only 4 points will be sent. Points are being dropped in such a way so that sent points are evenly spaced out within the set of points in a given minute.

Splunk queries require a return value for n9time and n9value. Use Splunk field extractions to return values using those exact names. The n9time is the actual time, and the n9value is the metric value. The n9time must a Unix timestamp and the n9value must be a float value.

Typically, you will rename _time to n9time and then rename the field containing the metric value (response_time in the previous example) to the n9value. The following is the appendage to your normal query that handles this.

No support for self-signed Splunk Enterprise

The Nobl9 agent requires that if Splunk Entrprise is configured to use TLS then it must successfully pass certificate validation which self-signed certificates do not.

SLO using Splunk Observability

Examples of SLO using Splunk Observability as threshold metric (raw metric) and as ratio metric (count metric):

# Splunk Observability as threshold metric (raw metric):

- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: tokyo-server-4-latency
    displayName: Server4 Latency [Tokyo]
    project: splunk-observability
  spec:
    description: Latency of Server4 in Tokyo ragion
    service: splunk-observability-demo-service
    indicator:
      metricSource:
        name: splunk-observability
      rawMetric:
        splunkObservability:
          program: 'sf_metric:demo.trans.latency+AND+demo_customer:samslack.com+AND+demo_datacenter:Tokyo+AND+demo_host:server4'
    timeWindows:
      - unit: Day
        count: 1
        calendar:
          startTime: 2020-01-21 12:30:00
          timeZone: America/New_York
    budgetingMethod: Occurrences
    objectives:
      - displayName: Excellent
        value: 200
        target: 0.8
        op: lte
      - displayName: Good
        value: 250
        target: 0.9
        op: lte
      - displayName: Poor
        value: 300
        target: 0.99
        op: lte
# Splunk Observability as ratio metric (count metric):

- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    displayName: Otel accepted metric points ratio
    name: otel-accepted-metric-points-ratio
    project: splunk-observability
  spec:
    budgetingMethod: Occurrences
    indicator:
      metricSource:
        kind: Agent
        name: splunk-observability
        project: splunk-observability
    objectives:
    - countMetrics:
        good:
          splunkObservability:
            program: sf_metric:otelcol_receiver_accepted_metric_points+AND+k8s.cluster.name:main+AND+k8s.node.name:ip-192-168-48-144.eu-central-1.compute.internal+AND+(NOT+sf_tags:(inactive))+AND+receiver:kubeletstats
        incremental: false
        total:
          splunkObservability:
            program: sf_metric:otelcol_exporter_sent_metric_points+AND+k8s.cluster.name:main+AND+k8s.node.name:ip-192-168-48-144.eu-central-1.compute.internal+AND+(NOT+sf_tags:(inactive))
      displayName: Enough
      tag: splunk-observability.otel-accepted-metric-points-ratio.1d000000
      target: 0.5
      value: 1
    service: splunk-observability-demo-service
    timeWindows:
    - count: 1
      isRolling: true
      period:
        begin: "2021-05-05T10:39:55Z"
        end: "2021-05-05T11:39:55Z"
      unit: Hour

Metric specification from SplunkObservability has 1 field:

API search syntax is similar to Elasticsearch syntax, which is a combination of key-value pairs with support for the following:

Example query:

sf_metric:demo.trans.latency+AND+demo_customer:samslack.com+AND+demo_datacenter:Tokyo+AND+demo_host:server4

Example valid response from API returns time series with key Ezp7SskA0AQ:

{"data" : 
{"Ezp7SskA0AQ":[ 
    [ 1619179380000, 16.768602751173166 ], [ 1619179440000, 16.72091431043774 ], 
    [ 1619179500000, 16.875384457870155 ], [ 1619179560000, 17.073371941423726 ], 
    [ 1619179620000, 16.787238497968534 ], [ 1619179680000, 16.577687804144333 ], 
    [ 1619179740000, 16.26799526231688 ] 
  ] },"errors" : [ ] }

When more than one time series is returned, the agent will reject this response and report an error data error: should contain exactly 1 time series, but contains 4.

Example of invalid response, which will finish with an error:

{"data" : {
"Ezp7SskA0AQ":[ 
    [ 1619179380000, 16.768602751173166 ], [ 1619179440000, 14.290333482407197 ], 
    [ 1619179500000, 14.065543760798436 ], [ 1619179560000, 14.16907882181271 ], 
    [ 1619179980000, 14.130899547241368 ] ]
"Ezp7DP6A4AA" : [ 
    [ 1619179380000, 19.545336804898724 ], [ 1619179440000, 14.290333482407197 ], 
    [ 1619179500000, 14.065543760798436 ], [ 1619179560000, 14.16907882181271 ], 
    [ 1619179980000, 14.130899547241368 ] ]
"Ezp7DHrA0AE" : [ 
    [ 1619179380000, 13.96714567647269 ], [ 1619179440000, 14.290333482407197 ], 
    [ 1619179500000, 14.065543760798436 ], [ 1619179560000, 14.16907882181271 ], 
    [ 1619179980000, 14.130899547241368 ] 
  ]},"errors" : [ ]}

Known limitations

Splunk Observability API Endpoint /timeserieswindow is limited only to filter raw data in JSON format without any analytics functions with default rollup. Which means that it can be easily used for raw metrics but it is hard to build count metrics.

SLO Using ThousandEyes

Raw metric:

#ThousandEyes as threshold metric - rolling occurrences

- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: my-thousandeyes-net-latency-time-rolling-occurrences
    project: thousandeyes
  spec:
    service: thousandeyes
    indicator:
      metricSource:
        name: thousandeyes
      rawMetric:
        thousandEyes:
          testID: 2024796
          testType: net-latency
    timeWindows:
      - unit: Day
        count: 1
        isRolling: true
    budgetingMethod: Occurrences
    objectives:
      - displayName: Good
        op: lte
        value: 40
        target: 0.50
      - displayName: Moderate
        op: lte
        value: 45
        target: 0.75
      - displayName: Annoying
        op: lte
        value: 50
        target: 0.95
#ThousandEyes as threshold metric - calendar occurrences

- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: my-thousandeyes-net-loss-calendar-occurrences
    project: thousandeyes
  spec:
    service: thousandeyes
    indicator:
      metricSource:
        name: thousandeyes
      rawMetric:
        thousandEyes:
          testID: 2024796
          testType: net-loss
    timeWindows:
      - unit: Day
        count: 7
        calendar:
          startTime: 2020-03-09 00:00:00
          timeZone: Europe/Warsaw
    budgetingMethod: Occurrences
    objectives:
      - displayName: Good
        op: lte
        value: 40
        target: 0.50
      - displayName: Moderate
        op: lte
        value: 45
        target: 0.75
      - displayName: Annoying
        op: lte
        value: 50
        target: 0.95

Count metric:

#ThousandEyes as ratio metric - rolling occurrences

- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: my-thousandeyes-web-page-load-time-rolling-occurrences
    project: thousandeyes
  spec:
    service: thousandeyes
    indicator:
      metricSource:
        name: thousandeyes
      rawMetric:
        thousandEyes:
          testID: 2280492
          testType: web-page-load
    timeWindows:
      - unit: Hour
        count: 1
        isRolling: true
    budgetingMethod: Occurrences
    objectives:
      - displayName: Good
        op: lte
        value: 75
        target: 0.90
#ThousandEyes as ratio metric - rolling occurrences

- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: my-thousandeyes-response-time-rolling-occurrences
    project: thousandeyes
  spec:
    service: thousandeyes
    indicator:
      metricSource:
        name: thousandeyes
      rawMetric:
        thousandEyes:
          testID: 2014018
          testType: http-response-time
    timeWindows:
      - unit: Hour
        count: 1
        isRolling: true
    budgetingMethod: Occurrences
    objectives:
      - displayName: Good
        op: lte
        value: 75
        target: 0.90
#ThousandEyes as ratio metric - calendar occurences

- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: my-thousandeyes-net-latency-calendar-occurrences
    project: thousandeyes
  spec:
    service: thousandeyes
    indicator:
      metricSource:
        name: thousandeyes
      rawMetric:
        thousandEyes:
          testID: 2014018
          testType: net-latency
    timeWindows:
      - unit: Day
        count: 1
        calendar:
          startTime: 2020-03-09 00:00:00
          timeZone: Europe/Warsaw
    budgetingMethod: Occurrences
    objectives:
      - displayName: Good
        op: lte
        value: 75
        target: 0.90
#ThousandEyes as ratio metric - rolling timeslices
- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: my-thousandeyes-web-dom-load-rolling-timeslices
    project: thousandeyes
  spec:
    service: thousandeyes
    indicator:
      metricSource:
        name: thousandeyes
      rawMetric:
        thousandEyes:
          testID: 2280492
          testType: web-dom-load
    timeWindows:
      - unit: Day
        count: 7
        isRolling: true
    budgetingMethod: Timeslices
    objectives:
      - displayName: Good
        op: lte
        value: 40
        target: 0.50
        timeSliceTarget: 0.50
      - displayName: Moderate
        op: lte
        value: 45
        target: 0.75
        timeSliceTarget: 0.75
      - displayName: Annoying
        op: lte
        value: 50
        target: 0.95
        timeSliceTarget: 0.95
#ThousandEyes as ratio metric - calendar timeslices

- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: my-thousandeyes-net-latency-calendar-timeslices
    project: thousandeyes
  spec:
    service: thousandeyes
    indicator:
      metricSource:
        name: thousandeyes
      rawMetric:
        thousandEyes:
          testID: 2024796
          testType: net-latency
    timeWindows:
      - unit: Month
        count: 1
        calendar:
          startTime: 2020-03-09 00:00:00
          timeZone: Europe/Warsaw
    budgetingMethod: Timeslices
    objectives:
      - displayName: Good
        op: lte
        value: 40
        target: 0.50
        timeSliceTarget: 0.50
      - displayName: Moderate
        op: lte
        value: 45
        target: 0.75
        timeSliceTarget: 0.75
      - displayName: Annoying
        op: lte
        value: 50
        target: 0.95
        timeSliceTarget: 0.95
#ThousandEyes as ratio metric - rolling timeslices
- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: my-thousandeyes-net-loss-rolling-timeslices
    project: thousandeyes
  spec:
    service: thousandeyes
    indicator:
      metricSource:
        name: thousandeyes
      rawMetric:
        thousandEyes:
          testID: 2014018
          testType: net-loss
    timeWindows:
      - unit: Day
        count: 1
        isRolling: true
    budgetingMethod: Timeslices
    objectives:
      - displayName: Good
        op: lte
        value: 75
        target: 0.90
        timeSliceTarget: 0.9
#ThousandEyes as ratio metric - calendar timeslices

- apiVersion: n9/v1alpha
  kind: SLO
  metadata:
    name: my-thousandeyes-response-time-calendar-timeslices
    project: thousandeyes
  spec:
    service: thousandeyes
    indicator:
      metricSource:
        name: thousandeyes
      rawMetric:
        thousandEyes:
          testID: 2014018
          testType: http-response-time
    timeWindows:
      - unit: Day
        count: 7
        calendar:
          startTime: 2020-03-09 00:00:00
          timeZone: America/New_York
    budgetingMethod: Timeslices
    objectives:
      - displayName: Good
        op: lte
        value: 75
        target: 0.90
        timeSliceTarget: 0.90

The ThousandEyes metric specification has two fields: testID and testType

💡 Note: for details on ThousandEyes metric types, check ThousandEyes Knowledge Base and ThousandEyes End-to-end metrics.

❗ Caution: The testType kind is not supported by Agent version below 0.32.0 (it works as a testID point to the net-latency test type). If you want to apply testType, update the Agent to version above 0.33.0.

Objectives

Objectives are the thresholds for your SLOs. You can use objectives to define the tolerance levels for your metrics.

objectives:
  - displayName: string # optional
    op: lte | gte | lt | gt # operator is allowed comparing method for labeling SLI
    value: numeric # value used to compare metrics values. All objectives of the SLO need to have unique value.
    target: numeric [0.0, 1.0) #budget target for given objective of the SLO
    timeSliceTarget: numeric (0.0, 1.0] #required only when budgetingMethod is set to TimeSlices
    # countMetrics {good, total} should be defined only if raw metric is not set. 
    # If rawMetric is defined on SLO level, raw data received from a metric source is compared with objective value. 
    # Count metrics good and total have to contain the same source type configuration (for example for prometheus).
    countMetrics:
        incremental: true | false #
        good:
      # exactly one of possible type possible depends on selected metricSource for SLO
          prometheus: #add this if using prometheus
            promql: # query such as “cpu_usage_user{cpu="cpu-total"}”
      # or
          newRelic: #add this if using newrelic
             nrql: # query such as "SELECT average(duration * 1000) FROM Transaction TIMESERIES"
      # or
          datadog: # add this if using datadog
             query: # query such as “sum:ingest.ok{*}"
      # or
          appDynamics:
            applicationName: # AppD application name such as "n9"
            metricPath: # AppD metrics path such as "End User Experience|App|End User Response Time”
      #  lightstep & splunk
        total:
    # exactly one of possible type possible depends on selected metricSource for SLO
        prometheus: #add this if using prometheus
          promql: # query such as “cpu_usage_user{cpu="cpu-total"}”
    # or
        newRelic: #add this if using newrelic
           nrql: # query such as "SELECT average(duration * 1000) FROM Transaction TIMESERIES"
    # or
        datadog: # add this if using datadog
           query: # query such as “sum:ingest.ok{*}"
    # or
        appDynamics:
          applicationName: # AppD application name such as "n9"
          metricPath: # AppD metrics path such as "End User Experience|App|End User Response Time”
  - displayName: string #optional
    value: numeric
    target: numeric
    op: lte | gte | lt | gt #operator is allowed comparing method for labeling SLI
    timeSliceTarget: numeric #required only field used when budgetingMethod is set to TimeSlices
apiVersion: n9/v1alpha
kind: SLO
metadata:
  name: webapp-frontend
  displayName: Webapp frontend service
  project: default
spec:
  description: SLO tracking the up time for our frontend service
  service: webapp-service #name of the service you defined
  indicator:
    metricSource:
      name: prometheus-source
  timeWindows:
    - unit: Week
      count: 1
      calendar:
        startTime: 2020-01-21 12:30:00 # date with time in 24h format
        timeZone: America/New_York # name as in IANA Time Zone Database
      isRolling: false
  budgetingMethod: Occurrences 
  objectives:
    - displayName: string
      op: lte 
      value: 0.95
      target: 0.99
      countMetrics:
          incremental: true
          good:
              prometheus:
                  query: “sample query”
          total:
               prometheus:
                  query: “sample query"
  alertPolicies:
    - pagerduty-alert

Notes

Agent

The Agent is a middleware between the Nobl9 app and external data source. The Agent gathers metrics data and sends it to the the Nobl9 app. Agents need to be installed on the customer’s server.

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: string
  displayName: string # optional
  project: default
spec:
  description: string #optional
  sourceOf:
  - Metrics
  - Services
# only one type of source configuration is allowed for agent
  prometheus:
     url: # base URL to Prometheus server 
  # or
  datadog:
     site: “eu|com” # datadog instance eu or com
  # or 
  newRelic:
     accountId: # NewRelic account ID (int)
  # or 
  appDynamics:
     url: # Base URL to a AppDynamics Controller
  # or 
  lightstep:
     organization: # organization name
     project:      # project name
  # or
  splunk:
     url: base API URL of the Splunk Search app
  # or
  splunkObservability:
     url: base API URL of the Splunk Observability app

Agent using Amazon CloudWatch

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: cloudwatch
  displayName: AWS CloudWatch
  project: cloudwatch
spec:
  description: Integration with CloudWatch
  sourceOf:
    - Metrics
  cloudWatch: {}

Agent Using Amazon Managed Service for Prometheus

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: amazon-prometheus-agent
  displayName: Amazon Prometheus Agent
  project: my-amazon-prometheus-project
spec:
  description: Agent settings for Amazon Managed Service for Prometheus datasource
  sourceOf:
  - Metrics
  - Services
  amazonPrometheus:
    url: https://aps-workspaces.eu-central-1.amazonaws.com/workspaces/some_workspace_id
    region: eu-central-1

Important notes:

Caution: Nobl9 Agent makes 1 request to API per minute per unique query. Make sure that your Amazon Managed Service server can handle additional traffic. For more details, go to AMP Quotas.

Agent using AppDynamics

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: appdynamics-agent
  displayName: AppDynamics Agent
  project: default
spec:
  description: Agent settings for appdynamics
  sourceOf:
  - Metrics
  - Services
  appDynamics:
    # [Mandatory] Base URL to a AppDynamics Controller
    # https://docs.appdynamics.com/display/PRO21/AppDynamics+Concepts
    url: https://data202103112323078.saas.appdynamics.com

Agent using BigQuery

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: bigquery
  displayName: BigQuery Agent
  project: default
spec:
  description: BigQuery description
  sourceOf:
    - Metrics
  bigQuery: {}

Before using the BigQuery Agent you will need to know your projectID and location.

Important note:

The minimal set of permissions required for the BigQuery agent connection is:

bigquery.datasets.get
bigquery.jobs.create
bigquery.jobs.list
bigquery.models.getData
bigquery.models.getMetadata
bigquery.tables.getData

Agent using Datadog

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: datadog
  project: datadog
spec:
  datadog:
    site: com
  sourceOf:
    - Metrics
    - Services

Agent specification from Datadog has one field:

When deploying Nobl9 agent user needs to provide API key and Application Key with DD_API_KEY and DD_APPLICATION_KEY environment variables.

Procedure to obtain both keys is documented here:

Rate limits

Requests to Datadog’s API are rate limited (see Rate Limits).

Nobl9 agent uses Query Timeseries API. Nobl9 agent tries to optimize the number of requests done to Datadog. In optimistic scenario it will use 60 requests per hour per data source. But in pessimistic scenario it can use 60 requests per unique query.

💡 Note: Information in the above paragraph is a subject to change in future releases of Nobl9 agent. User needs to be aware of those limitations and request increase of appropriate rate limit from Datadog support if necessary.

Agent using Dynatrace

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: dynatrace
  displayName: Dynatrace Agent
  project: dynatrace
spec:
  sourceOf:
    - Metrics
    - Services
  dynatrace:
    url: https://rxh70845.live.dynatrace.com/

Agent specification from Dynatrace has one field:

The expected URL format is:

Based on Dynatrace documentation, when deploying the Nobl9 Agent for Dynatrace, you must provide the DYNATRACE_TOKEN environment variable for authentication, as described in the Dynatrace API - Tokens and authentication. There is a placeholder for that value in the configuration obtained from the installation instructions in the Nobl9 web app.

Agent using Elasticsearch


# Elasticsearch schema 

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: string
  displayName: string # optional
  project: default
spec:
  sourceOf:
    - Metrics
    - Services
  elasticsearch:
    url: string

#Elasticsearch Agent example

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: elasticSearch
  displayName: Elastic Search Agent
  project: elastic
spec:
  sourceOf:
    - Metrics
    - Services
  elasticsearch:
    url: https://observability-deployment-946814.es.eu-central-1.aws.cloud.es.io:9243

The spec.elasticsearch.url must point to elasticsearch app. If you are using Elastic Cloud that URL can be obtained from here:

  1. Log in to elastic.
  2. Select your deployment.
  3. Open the deployment details and copy the endpoint of Elasticsearch.
  4. Provide API Key (required) in ELASTICSEARCH_TOKEN environment variable for authentication when you deploy the Nobl9 Agent for Elasticsearch.

Agent using Grafana Loki

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: grafana-loki-agent
  displayName: Grafana Loki Agent
  project: default
spec:
  description: Agent settings for Grafana Loki datasource
  sourceOf:
  - Metrics
  - Services
  grafanaLoki:
    url: http://loki.example.com

Agent using Graphite

# Graphite Agent schema:

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: string
  displayName: string # optional
  project: default
spec:
  description: string # optional
  sourceOf:
  - Metrics
  - Services
# only one type of source configuration is allowed for agent
  graphite:
    url: string # render API URL endpoint of Graphite's instance
# Graphite Agent example:

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: graphite-agent
  displayName: Graphite Agent
  project: graphite
spec:
  description: Agent settings for graphite datasource
  sourceOf:
  - Metrics
  - Services
  graphite:
    url: http://app.grap.us1.com

Agent using Lightstep

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: lightstep
  displayName: Lightstep Agent
  project: lightstep
spec:
  description: Agent settings for Lightstep
  sourceOf:
    - Metrics
    - Services
  lightstep:
     organization: LightStep-Play
     project: play

Lightstep is a SaaS, and doesn’t require the url field, only the name of organization registered in Lightstep, and the name of the project.

When deploying the Nobl9 agent for Lightstep, it is required to provide LS_APP_TOKEN environment variable for authentication with the Lightstep Streams Timeseries API. There is a placeholder for that value in the configuration obtained from installation instructions in the Nobl9 web app.

See also: How to obtain access token for Lightstep

Agent using New Relic

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: newrelic
  displayName: Newrelic Agent
  project: newrelic
spec:
  description: Agent settings for newrelic datasource
  sourceOf:
  - Metrics
  - Services
  newrelic: 
    AccountID: 21436587

Agent specification from New Relic has one field:

💡 Note: It is possible that a user is assigned to multiple accounts. In any case, the account ID can be obtained in the same way.

More on account ID:

How to obtain Account ID from New Relic:

Rate limits

NewRelic has very broad rate limits. Nonetheless, users should be aware of them:

Agent using OpenTSDB

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: opentsdb
  displayName: OpenTSDB Agent
  project: opentsdb
spec:
  sourceOf:
    - Metrics
    - Services
  opentsdb: null
  url: 'http://localhost:4242'

Agent Using Pingdom

  apiVersion: n9/v1alpha
  kind: Agent
  metadata:
    name: pingdom
    displayName: Pingdom
    project: my-pingdom
  spec:
    sourceOf:
      - Metrics
      - Services
    pingdom: {}

Agent using Prometheus

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: prometheus-agent
  displayName: Prometheus Agent
  project: default
spec:
  description: Agent settings for prometheus datasource
  sourceOf:
  - Metrics
  - Services
  prometheus:
    url: http://prometheus.example.com

Agent specification from Prometheus has one field:

Prometheus agent makes requests to Range queries API endpoint in the form /api/v1/query_range. For example,

GET /api/v1/query_range
POST /api/v1/query_range

Hence, do not include the above API path in the URL. Specify only the base URL for the Prometheus server. For example, if Prometheus server is available under http://prometheus.example.com and the API is accessed using http://prometheus.example.com/api/v1, use only http://prometheus.example.com.

Other APIs or Web UIs have similar paths endings, which should also be omitted, such as, the /graph part of the path.

Prometheus integration does not integrate directly with data exposed from services in the Prometheus format, usually under /metrics path. Do not set the URL to metrics exposed directly from such service.

Cortex Support with Nobl9 Prometheus Agent

Cortex is a database based on Prometheus with compatible API. Therefore, it is possible to use Cortex with the Nobl9 Prometheus agent.

Cortex cluster setup is out of the scope of this document and is described in the Cortex documentation. Cortex deployment can be simplified with the official Helm chart.

As described in Cortex Architecture, Prometheus API is exposed by the Nginx under default address http://cortex-nginx/prometheus. This address can be used as Prometheus URL in the agent configuration panel. The default Prometheus endpoint can be changed according to the API documentation. In case the endpoint configuration was changed, make sure that the address http://cortex-nginx/api/v1/query_range is accessible by the Agent.

Thanos support with Nobl9 Prometheus Agent

Thanos is High Availability Prometheus setup and can be used with Nobl9 Prometheus Agent.

Thanos cluster setup is out of the scope of this document and is described in the Thanos components.

Thanos exposes Prometheus API using Querier. Querier address must be used as Prometheus URL in the Nobl9 agent configuration.

Agent using Splunk

apiVersion: n9/v1alpha
kind: Agent
metadata:
  name: splunk
  project: splunk
spec:
  sourceOf:
    - Metrics
    - Services
  splunk:
    url: https://splunk.example.com:8089/services

Splunk configuration for the agent only accepts a single parameter: url. The url has to point to the base API URL of the Splunk Search app. It will usually have a form of:

SPLUNK_BASE_URL - for Splunk Enterprise, the base URL is configured during the deployment of Splunk software.

PORT_NUMBER - Assuming the API is using the default port is 8089. It is recommended that you contact your Splunk Admin to get your API Token and to verify the correct URL to connect.

When deploying the Nobl9 agent for Splunk, it is required to provide SPLUNK_APP_TOKEN OR SPLUNK_USER and SPLUNK_PASSWORD environment variables for authentication with the Splunk Search App REST API. There are a placeholders for the values in configuration obtained from installation instructions in the Nobl9 web app.

See also:

Splunk Enterprise API rate limits are configured by its administrators. Rate limits have to be high enough to accommodate searches from Nobl9 agent. The Nobl9 agent makes one query per minute per unique query and fieldName combination.

Agent using Splunk Observability

- apiVersion: n9/v1alpha
  kind: Agent
  metadata:
    name: splunk-observability
    displayName: Splunk Observability
    project: splunk-observability
  spec:
    sourceOf:
      - Metrics
      - Services
    splunkObservability:
      realm: us1

SplunkObservability is SaaS but the URL which indicates the realm (region) needs to be provided. See Realms in endpoints.

When deploying the N9 agent for SplunkObservability it is required to provide SPLUNK_OBSERVABILITY_ACCESS_TOKEN environment variable for authentication with organization API Access Token (see Create an access token). There is a placeholder for that value in configuration obtained from installation instructions in the Nobl9 web app.

See also:

Agent using ThousandEyes

- apiVersion: n9/v1alpha
  kind: Agent
  metadata:
    name: thousandeyes
    displayName: ThousandEyes
    project: thousandeyes
  spec:
    sourceOf:
      - Metrics
      - Services
    thousandEyes: {}

Direct

Direct gathers metrics data directly from the external source based on provided credentials. The Customer does not need to install anything on the server.

apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: string
  displayName: string # optional
  project: default
spec:
  description: string #optional
  sourceOf:
  - Metrics
  - Services
  # only one type of source configuration is allowed for direct
  datadog:
    site: "eu|com"         # datadog instance eu or com
    apiKey: string         # datadog api key
    applicationKey: string # datadog application key
  # or 
  newRelic:
      accountId: string # NewRelic account ID (int)
      insightsQueryKey: string # NewRelic query key (string). Possible to set as agent an env variable.

Direct Amazon CloudWatch

apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: cloudwatch-direct
  displayName: AWS CloudWatch
  project: cloudwatch-direct
spec:
  description: Direct integration with CloudWatch
  sourceOf:
    - Metrics
  cloudWatch:
    accessKeyID: "" #secret
    secretAccessKey: "" #secret

Direct AppDynamics

apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: appdynamics-direct
  displayName: AppDynamics direct
  project: appdynamics-direct
spec:
  description: AppDynamics direct integration
  sourceOf:
    - Metrics
    - Services
  appDynamics:
    url: "example-url"
    clientID: "example-client-id"
    clientSecret: someSecret # secret

Direct BigQuery

apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: bigquery-direct
  displayName: BigQuery direct
  project: bigquery-direct
spec:
  description: Direct integration with BigQuery
  sourceOf:
    - Metrics
  bigQuery:
    serviceAccountKey: |-
      # secret, embed here GCP credentials.json

Direct Datadog

apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: datadog-direct
  displayName: Datadog direct
  project: datadog-direct
spec:
  description: direct integration with Datadog
  sourceOf: # One or many values from this list are allowed: Metrics, Services
    - Metrics
    - Services
  datadog:
    site: com
    apiKey: "" # secret
    applicationKey: "" # secret

Direct NewRelic

apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: newrelic-direct
  displayName: Newrelic direct
  project: newrelic-direct
spec:
  description: direct integration with Newrelic
  sourceOf: # One or many values from this list are allowed: Metrics, Services
    - Metrics
    - Services
  newRelic:
    accountId: 1437038
    insightsQueryKey: "" # secret

Direct Pingdom

apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: pingdom-direct
  displayName: my-pingdom
  project: my-pingdom
spec:
  description: Direct integration with Pingdom
  sourceOf:
    - Metrics
    - Services
  pingdom:
    apiToken: "" #secret

Direct Splunk

apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: splunk-direct
  displayName: Splunk direct
  project: splunk-direct
spec:
  description: Direct integration with Splunk
  sourceOf:
    - Metrics
    - Services
  splunk:
    accessToken: "" #secret
    url: "example-url"

Direct Splunk Observability

apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: splunk-observability-direct
  displayName: Splunk Observability direct
  project: splunk-observability-direct
spec:
  description: Direct integration with Splunk Observability
  sourceOf:
    - Metrics
    - Services
  splunkObservability:
    realm: us1
    accessToken: example-access-token #secret

Direct ThousandEyes

- apiVersion: n9/v1alpha
kind: Direct
metadata:
  name: thousandeyes
  displayName: Thousand Eyes Direct
  project: thousandeyes-direct
spec:
  description: Direct integration with ThousandEyes
  sourceOf:
    - Metrics
    - Services
  thousandEyes:
    oauthBearerToken: example-bearer-token

When the Nobl9 agent is deployed for ThousandEyes, you must to provide a THOUSANDEYES_OAUTH_BEARER_TOKEN environment variable for authentication with ThousandEyes API. To get the OAUTH_BEARER_TOKEN:

  1. Log in to your ThousandEyes account.
  2. Navigate to Account Settings.
  3. Select Users and Roles.
  4. Navigate to the bottom of the page and you will see User API Tokens.
  5. Select OAuth Bearer Token
    Currently, Nobl9 only supports OAUTH_BEARER_TOKEN.

Project

Projects are primary grouping of resources in Nobl9. For more details, refer to the Projects in the Nobl9 Platform section.

To create a Project in sloctl apply the following yaml:

apiVersion: n9/v1alpha
kind: Project
metadata:
  name: my-project # mandatory
spec:
  description: "" # optional

Notes:

my-project-name

RoleBinding

Organization Admins can manage users in their Organization and Project Owners can manage users in their Projects through the sloctl tool using RoleBindings.

Assumptions and Validations

Applying Organization Role Binding

You can configure an Organization-binding role for a given user in sloctl, for example:

 apiVersion: n9/v1alpha
kind: RoleBinding
metadata:
    name: organization-admin-adam
spec:
    user: 00u3ywkof3cTkMLOH4x7 # User ID
    roleRef: organization-admin # Existing organization role (since projectRef is empty)

💡 Note: User ID for the user field can be retrieved from Settings > Account and Settings > Users in the UI:

Applying Project Role Binding

You can configure a Project-binding role for a given user in sloctl, for example:

apiVersion: n9/v1alpha
kind: RoleBinding
metadata:
    name: project-owner-adam
spec:
    user: 00u3ywkof3cTkMLOH4x7
    roleRef: project-owner # Existing project role.
    projectRef: default # Project needs to exist

Validation Errors

The following are common errors related to the role binding that users can experience in sloctl:

AlertMethod

When an alert is triggered, Nobl9 enables you to send the notification to an external tool or a REST endpoint (WebService).

Discord Alert Method

apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: string # Name of the Integration
  displayName: string # optional
  project: default
spec:
  description: string # optional
  discord:
    url: # URL to Discord webhook
apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: discord-notification
  displayName: Discord notification
  project: default
spec:
  description: Sends message to a Discord channel through webhook
  discord:
    # Nobl9 #general Discord channel
    url: https://discord.com/api/webhooks/809803263775211571/D4-5q51DehrBpOAFND6naV8MgCQwmu1vpAwXrO8vPVflFt1bo6J0wMXzvFAttb_2CRjv

Email Alert Method

apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: string
  displayName: string # optional
spec:
  description: string # optional
  email:
    to:
      - string #validated if email address is correct, max. number of recipients is 10
    cc:
      - string #validated if email address is correct, max. number of recipients is 10
    bcc:
      - string #validated if email address is correct, max. number of recipients is 10
    subject: string #arrays ($alert_policy_conditions[]) are not supported in this field
    body: string #all variables ($variableName) are supported in this field
apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: email-notification
  displayName: Email notification
spec:
  description: Sends email notification to selected recipients
  email:
    to:
      - alerts-tests@example.com
    cc:
      - alerts-tests+cc@example.com
    bcc:
      - alerts-tests+bcc@example.com
    subject: Your SLO $slo_name needs attention! $slo_labels_text
    body: |+
      $alert_policy_name has triggered with the following conditions:
      $alert_policy_conditions[]
      Time: $timestamp
      Severity: $severity
      Project: $project_name
      Service: $service_name
      Organization: organization
      Labels:
       SLO: $slo_labels_text
       Service: $service_labels_text
       Alert Policy: $alert_policy_labels_text

YAML for Email Integration supports custom notification message template. You can customize the template with variables in the following format: $variable_name.

List of all supported variables:

slo_labels_text, service_labels_text and alert_policy_labels_text are comma-separated key-value pairs, for example:

Jira Alert Method

apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: string
  displayName: string
  project: string
spec:
  jira:
    url: string # requires HTTPS
    username: string
    apiToken: string #secret
    projectKey: string
apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: jira-notification
spec:
  jira:
    url: https://mycompany.atlassian.net/
    username: jira-alerts@mycompany.com
    apiToken: "secret key"
    projectKey: "AT"

Creating a Jira alert requires you to customize the following:

💡 Please note:

MS Teams Alert Method

You can create alerts and notifications for your MS Teams account.

apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: string
  displayName: string
  project: string
spec:
  msteams:
    url: string # requires HTTPS, secret field
apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: teams-notification
  displayName: MSTeams notification
spec:
  description: Send message to MSTeams channel
  msteams:
    url: https://webhook.office.com/webhookb2/12345

The only field specific to MS Teams is url. This field is a secret and therefore will be replaced with [hidden] string when returned from sloctl.

Opsgenie Alert Method

Two authentication methods are supported for Opsgenie API integration:

The choice between the two authentication methods is offered for convenience, as some users may only have access to one of the methods.

apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: opsgenie-notification-key
  displayName: Opsgenie notification with GenieKey
spec:
  description: Sends HTTP request to Opsgenie
  opsgenie:
    auth: GenieKey a5983bf6-e378-4c97-a6ab-8be3589e190f
    url: https://api.opsgenie.com
apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: opsgenie-notification-basic
  displayName: Opsgenie notification with Basic
spec:
  description: Sends HTTP request to Opsgenie
  opsgenie:
    auth: Basic YTU5ODNiZjYtZTM3OC00Yzk3LWE2YWItOGJlMzU4OWUxOTBm
    url: https://api.opsgenie.com

Pagerduty Alert Method

apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: string # Name of the Integration
  displayName: string # optional
  project: default
spec:
  description: string #optional
  pagerduty:
    integrationKey: # pager duty integration key
apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: pagerduty-notification
  displayName: PagerDuty notification
  project: default
spec:
  description: Sends notification to PagerDuty endpoint
  pagerDuty:
    integrationKey: "12345678901234567890123456789012"

Slack Alert Method

apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: string # Name of the Integration
  displayName: string #optional
  project: default
spec:
  description: string # optional
  slack:
    url: # URL to Slack webhook
apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: slack-notification
  displayName: slack notification
  project: default
spec:
  description: Sends notification to a Slack channel
  slack:
    url: https://hooks.slack.com/services/1234567890/abcdef

ServiceNow Alert Method

You can create alerts and notifications for your ServiceNow account.

apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: servicenow-notification
  displayName: ServiceNow notification
spec:
  description: Sends HTTP request to ServiceNow
  servicenow:
    username: user
    password: pass
    instanceid: dev55555

The following must know the following before creating your ServiceNow alert:

  1. Enter your ServiceNow username.
  2. Enter your ServiceNow password.
  3. Enter an instanceid.
    An InstanceID is a globally unique ID across all ServiceNow instances. Check the {{/stats.do}} page to view the instanceid for any instance.

Webhook Alert Method

apiVersion: n9/v1alpha
kind: AlertMethod
metadata:
  name: webhook-notification
  displayName: Webhook notification
  project: default
spec:
  description: Sends HTTP request to custom webhook
  webhook:
    url: https://sample-end-point/notify

YAML for Webhook integration supports custom notification message template. The template can be specified in two ways:

  1. Only variables are specified. And the notification message will be generated.
    For example, the following YAML specification

    apiVersion: n9/v1alpha
    kind: AlertMethod
    ...
    spec:
      webhook:
        url: https://hook.web
        templateFields:
          - slo_name
          - slo_details_link
    

    will yield a notification message:

    {
      "slo_name": "Test SLO",
      "slo_details_link": "https://main.nobl9.dev/slo/details?project=proj1&name=test_slo"
    }
    
  2. A full message template is specified, with variables in form $variable_name.
    For example, the following YAML specification

    apiVersion: n9/v1alpha
    kind: AlertMethod
    metadata:
      displayName: Webhook Custom
      name: webhook-custom
      project: hawrus-of-puppets
    spec:
      description: ""
      webhook:
        template: |-
          {
           "message": "Your SLO $slo_name needs attention!",
           "timestamp": "$timestamp",
           "severity": "$severity",
           "slo": "$slo_name",
           "project": "$project_name",
           "organization": "$organization",
           "alert_policy": "$alert_policy_name",
           "alerting_conditions": $alert_policy_conditions[],
           "service": "$service_name",
           "labels": {
            "slo": "$slo_labels_text",
            "service": "$service_labels_text",
            "alert_policy": "$alert_policy_labels_text"
           }
          }
        url: '[hidden]'
    

List of all supported variables:

iso_timestamp prints the date in the RFC3339 format (“2006-01-02T15: 04: 05Z07: 00”).

The difference between alert_policy_conditions[] and alert_policy_conditions_text is that alert_policy_conditions[] creates a valid JSON array of conditions as string, whereas alert_policy_conditions_text creates a single string field.
Here is an example of generated messages:

{
  "text": "Remaining error budget is 10%, Error budget would be exhausted in 15 minutes and this condition lasts for 1 hour",
  "array": [
    "Remaining error budget is 10%",
    "Error budget would be exhausted in 15 minutes and this condition lasts for 1 hour"
  ]
}

slo_labels_text, service_labels_text and alert_policy_labels_text are comma-separated key-value pairs, for example:

Webhook integration definition requires one (and only one) of these entries:

Testing Alert Method

Users can test their Alert Method configuration. This functionality allows users to verify if the configuration of the Alert Method is set up correctly. Users can test their Alert Method for every type of notification service supported on the Nobl9 Platform.

Follow the below steps to test your Alert Method:

  1. Configure your Alert Method (for details, go here).
  2. In the Create Alert Method wizard, click the ‘Add alert method’ button in the lower right corner.
  3. In the Details screen, click the ‘Test’ button in the upper right corner.
  4. If the Alert Method’s configuration is correct, Nobl9 will prompt the following message next to the ‘Test’ button:

    • Check the service you alert is integrated with and review the alert message you created. The following is an example of a test e-mail alert:

    • If the configuration is incorrect, Nobl9 will prompt an error message with relevant details, for example:

    💡 Note: For email alerts there’s a cooldown interval of 300s in which you can’t re-run a test.

    ❗ Caution: Error messages depend on the response from an external Alert Method source, for example:

You can also test your existing Alert Method. To do that:

  1. Go to Integrations > Alert Methods.
  2. In the list, choose the Alert Method you want to test, and click it.
  3. In the Details screen, click the ‘Test’ button in the upper right corner.

AlertPolicy

The Alert Policy defines when to notify the alert method. Alert Policy accepts up to 7 conditions. All conditions need to be satisfied to trigger an alert.

apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
  name: string
  displayName: string
  project: default
spec:
  description: string
  severity: Low | Medium | High
  conditions:
    - measurement: timeToBurnBudget | averageBurnRate | burnedBudget
      value: string or numeric 
      op: lt | gt | lte | gte
      lastsFor: time duration # default 0 (seconds | minutes | hours) example “5m”
# From 0 to 5 integrations allowed for the alert policy
  integrations:
    - name: string # name of the integration defined earlier
      project: string # optional if not defined, project is the same as an Alert Policy.
apiVersion: n9/v1alpha
kind: AlertPolicy
metadata:
  name: budget-is-burning-too-fast
  displayName: Budget is burning too fast
  project : default
spec:
  description: Error budget burn rate is too high
  severity: Medium
  conditions:
    - measurement: timeToBurnBudget
      value: "24h"
      op: lt
    - measurement: averageBurnRate
      value: 1.5
      op: gte
  integrations:
    - name: webhook-notification
      project: default

Notes

DataExport

The DataExport defines the configuration to export your data from Nobl9.

💡 Note: DataExport is a premium feature. Please contact your Nobl9 sales representative to enable the DataExport feature.

apiVersion: n9/v1alpha
kind: DataExport
metadata:
  name: string
  displayName: string # optional
  project: string
spec:
  exportType: S3 | Snowflake | GCS
  spec:
    bucketName: string # Required name of s3 bucket or gcs bucket.
    # Required for S3 and Snowflake export types. This is the Amazon Resource Name (ARN) of the role
    # that the Nobl9 is assuming to upload files.
    roleArn: string
apiVersion: n9/v1alpha
kind: DataExport
metadata:
  name: s3-data-export
  displayName: S3 data export
  project: default
spec:
  exportType: S3
  spec:
    bucketName: examplebucket
    roleArn: arn:aws:iam::123456789:role/example