NodeJS App Tracing & Monitoring with OpenTelemetry

NodeJS App Tracing & Monitoring with OpenTelemetry

In this example, I'm going through enabling observability (Tracing, Monitoring and Logging) for a NodeJS application using Opentelemetry. I went over the concept of observability and Opentelemetry in the first and second sections, so if you are familiar with them you can skip them and dive into the implementation.

Observability

what does observability mean?

Observability helps us understand a system, troubleshoot and handle problems from the outside, regardless of the systems processes. so basically, observability refers to monitoring and measuring the internal state of a system's distributed applications by examining its outputs.

to observe a system properly, the system must output signals such as traces, metrics and logs, which are known as the three pillars of observability.

  • Logs: system-outputted timestamped messages that tell what happened in a specific service or component.

  • Metrics: aggregations over a period of time of statistical data about the infrastructure or the application.

  • Traces: records of the context and the paths of requests through multi-service architecture. A Trace is made of one or multiple Spans, the Root Span, and spans underneath it, which tracks specific operations that a request makes.

OpenTelemetry

What is OpenTelemetry?

basically, OpenTelemetry is the three pillars, as it provides the ability to collect and ship them. OpenTelemetry provides a set of standardized SDKs, APIs, and tools for collecting and manipulating data under one unified SDK (the metrics, traces and logs are attached which makes monitoring easier), propagating the context between services, and exporting the data to an observability back-end.

OpenTelemetry Stack

OpenTelemetry stack consists of 3 layers:

  • The SDK that collects the data export it.

  • that backend where data is received, processed and exported somewhere (DB for example).

  • Visualization layer to visualize the data (could be any open-source vendor).

How does OpenTelemetry work?

in a nutshell, lets say there is a multi-service architecture system, where a user sends a request to service A which triggers another request to service B, then service B performs a query to store data in a database. Both service A and B would have open-telemetry installed. any request sent from service A to service B would be under a context as a child of the request in service A, and in order to achieve this, service A injects the context in the outgoing request headers, so open-telemetry in service B would understand that this is a child of another operation happened in service A. All these operations would be under the same trace, and have different spans that points to its parent span.

if we wanna discuss how it works in more depth, each service would has an instrumentation which collects data and pass it to a processor, the processor could manipulate the data if want it to, then it will pass it to an exporter which ships it somewhere.

Instrumentation: instrumentation is something that attaches to the service and collects data during its runtime. there is 2 types of instrumentations, automatic and manual.

  • Automatic Instrumentation: OpenTelemetry instruments your service without touching the source code, and will add the OpenTelemetry API and SDK capabilities to your application, and add a set of Instrumentation Libraries and exporter dependencies. automatic instrumentation collects data in the runtime, produces spans to represent the executed operations, and these spans will be produced based on specifications and semantic conventions. refer to the specifications here

  • Manual Instrumentation: it's coding against the OpenTelemetry API to collect telemetry from the service code or shared frameworks. in manual instrumentation, you'll need to import and configure OpenTelemetry API and SDK and create telemetry data by creating traces and metric events. you can still make use of Instrumentation libraries.

for more info about instrumentation refer to the documentation here

Processor: the processor is a kind of data pipeline which takes collected data from the instrumentation, process it and send it to the exporter.

Exporter: the exporter sends the processed data somewhere like a vendor or database or visualization tool.

for more about OpenTelemetry information refer to the documentation here

Implementation

We have an application based on two locally hosted NodeJS services, one has note CRUD modules and the other just receives a get call and returns static data.

you can clone the application code from the GitHub Repo here.

We will use OpenTelemetry’s NodeJS client library to instrument our application to collect telemetry data, Then export it to a collector gateway, then the gateway will export traces to Jaeger and metrics to Prometheus.

we are gonna use the following OpenTelemetry packages:

@opentelemetry/api
@opentelemetry/resources
@opentelemetry/semantic-conventions
@opentelemetry/sdk-trace-node
@opentelemetry/sdk-trace-base
@opentelemetry/instrumentation
@opentelemetry-instrumentation-express
@opentelemetry/instrumentation-http
@opentelemetry/exporter-trace-otlp-http
@opentelemetry/sdk-metrics
@opentelemetry/exporter-metrics-otlp-http

Otel API Configuration

Traces Configuration (Tracer.js File)

I added a tracer.js file where I setup the application tracing and instrumentation configuration.

const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node')
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base')
const { trace } = require('@opentelemetry/api')
const { Resource } = require('@opentelemetry/resources')
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions')
const { registerInstrumentations } = require('@opentelemetry/instrumentation')
const { ExpressInstrumentation } = require('opentelemetry-instrumentation-express')
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http')
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http')

const tracing = (serviceName) => {

  // Define traces
  const provider = new NodeTracerProvider({
    resource: new Resource({
      [SemanticResourceAttributes.SERVICE_NAME]: serviceName
    })
  })
  const traceExporter = new OTLPTraceExporter({
    url: 'http://localhost:4318/v1/traces',
  })
  provider.addSpanProcessor(new SimpleSpanProcessor(traceExporter))
  provider.register()
  registerInstrumentations({
    instrumentations: [
      new ExpressInstrumentation(),
      new HttpInstrumentation(),
    ],
    tracerProvider: provider,
  })
  const tracer = trace.getTracer(serviceName)

  return {tracer}
}

module.exports = tracing

then we set up the tracer configuration by defining a OTLPTraceExporter which will export the data to the collector gateway at the provided endpoint, and we define a provider that will give the ability to create traces. then we add a processor that will process the spans before its exported to the collector gateway by the defined exporter. lastly, the configuration is registered.

we are also registering instrumentation libraries, ExpressInstrumentation() which provides automatic instrumentation for the express, which allows to automatically collect the data incoming to express, and HttpInstrumentation() which allows the automatic collection of API calls data. finally, we get the tracer and return it.

// ./controllers/notes.js
const otelApi = require('@opentelemetry/api')
......
......
......
const Init = init('notes-service')
const httpCounter = Init.meter.createCounter('http_calls')
......
......
......
const activeSpan = otelApi.trace.getActiveSpan(otelApi.context.active())
activeSpan.addEvent('all notes were requested from the notes service', {stringNotes})

Metrics Configuration

in the file above, first we setup the metric configuration by defining a meter from MeterProvider, and we provide the meter with a CollectorMetricExporter which will export the metrics to a collector agent at its provided endpoint http://localhost:4318/v1/metrics.

// monitor.js file
const { MeterProvider, PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics')
const { Resource } = require('@opentelemetry/resources')
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions')
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http')

const monitoring = (serviceName) => {
// Define metrics
  const metricExporter = new OTLPMetricExporter({
    url: 'http://localhost:4318/v1/metrics'
  })
  const meterProvider = new MeterProvider({
    resource: new Resource({
      [SemanticResourceAttributes.SERVICE_NAME]: serviceName,
    }),
  })
  meterProvider.addMetricReader(new PeriodicExportingMetricReader({
    exporter: metricExporter,
    exportIntervalMillis: 1000,
  }))
  const meter = meterProvider.getMeter(serviceName)

  const requestCounter = meter.createCounter('requests', {
    description: 'http requests counter',
  })

  return {meter, requestCounter}
}

module.exports = monitoring

Initialization

In our app.js file we are initializing the tracing and monitoring, getting the requestCounter() function defined in the metrics configuration and use it to count api calls express is receiving. we are also importing the Otel API in the router file (./controller/notes.js) to add event logs to the spans when an operation is executed.

Collectors and Services Configuration

Collector Gateway Config

the collector gateway is gonna receive telemetry data through a receiver which is first defined in the config file, which defines how data is going to be received (protocol, format, port…). we are using OTLP receiver.

then we define the processors, a batch processor and Resource processor that alters the sent data.

we have prometheus exporter which is responsible of sending the metrics to prometheus service. we also have Jaeger exporter which exports tracing data to Jaeger service. we also have logging exporter to log most interactions to the console.

after configuring the receivers, processors and exporters components, we have to configure what components are enabled and which component is connected to which in the collector gateway, and we achieve that by configuring the service section.

#collector-gateway.yml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - http://*
            - https://*

processors:
  batch:
    timeout: 1s
  resource:
    attributes:
      - key: component.name2
        value: "gateway"
        action: insert

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    send_timestamps: true
    namespace: prom
    const_labels:
      label1: value1

  logging:
    loglevel: info

  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [logging, jaeger]

    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging, prometheus]

Prometheus Config

we add Prometheus configuration, and there is nothing worth mentioning other than we set the target where Prometheus gets the metrics to the gateway exporter endpoint.

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: prometheus

    scrape_interval: 5s
    scrape_timeout: 2s
    honor_labels: true

    static_configs:
       - targets: ['collector-gateway:8889']

Docker Compose

lastly, we use docker-compose to spin up all the required services to get our architecture up and running. we have Jaeger, Prometheus, and collector gateway services configured in a docker-compose file, all assigned to the configured ports in the SDK code and the services configuration files. Prometheus and gateway refers it its configuration files.

version: '3'

services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports: 
      - 5775:5775
      - 6831:6831
      - 6832:6832
      - 5778:5778
      - 16686:16686
      - 14268:14268
      - 14250:14250
      - 9411:9411

  prometheus:
    image: prom/prometheus
    volumes: 
      - ./observability/prometheus-config.yml:/etc/prometheus/prometheus.yml
    ports: 
      - 9090:9090

  collector-gateway:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/conf/collector-config.yaml"]
    volumes:
      - ./observability/collector-gateway.yml:/conf/collector-config.yaml
    ports:
      - "8888:8888"   # metrics data exposed to prometheus
      - "8889:8889"   # Prometheus exporter metrics
      - "4318:4318"   # OTLP HTTP receiver

    depends_on:
      - jaeger

Run The Application

to get all our services running, we start by running the following commands respectively from our root directory

  • docker-compose up

  • npm install

  • npm run dev //to run our A service (notes service)

  • node app2.js //to get our B service running

note that service A is using Mongodb, so you have to provide a valid MONGODB_URI in the environment variables file.

visit http://localhost:3001/ to access A service UI. Service B doesn't has a UI and has a single endpoint accessed by service A. visit http://localhost:3001/api/notes/replies endpoint to trigger service B by service A.

you can access Jaeger UI on this endpoint http://localhost:16686/. Access Prometheus UI on the following endpoint http://localhost:9090/ and search for "http_calls_total” to the API call count.

Project Github Link: