This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Observability

FSM’s observability stack includes Prometheus for metrics collection, Grafana for metrics visualization, Jaeger for tracing and Fluent Bit for log forwarding to a user-defined endpoint.

1 - Metrics

Proxy and FSM control plane Prometheus metrics

FSM generates detailed metrics related to all traffic within the mesh and the FSM control plane. These metrics provide insights into the behavior of applications in the mesh and the mesh itself helping users to troubleshoot, maintain and analyze their applications.

FSM collects metrics directly from the sidecar proxies (Pipy). With these metrics the user can get information about the overall volume of traffic, errors within traffic and the response time for requests.

Additionally, FSM generates metrics for the control plane components. These metrics can be used to monitor the behavior and health of the service mesh.

FSM uses Prometheus to gather and store consistent traffic metrics and statistics for all applications running in the mesh. Prometheus is an open-source monitoring and alerting toolkit which is commonly used on (but not limited to) Kubernetes and Service Mesh environments.

Each application that is part of the mesh runs in a Pod which contains an Pipy sidecar that exposes metrics (proxy metrics) in the Prometheus format. Furthermore, every Pod that is a part of the mesh and in a namespace with metrics enabled has Prometheus annotations, which makes it possible for the Prometheus server to scrape the application dynamically. This mechanism automatically enables scraping of metrics whenever a pod is added to the mesh.

FSM metrics can be viewed with Grafana which is an open source visualization and analytics software. It allows you to query, visualize, alert on, and explore your metrics.

Grafana uses Prometheus as backend timeseries database. If Grafana and Prometheus are chosen to be deployed through FSM installation, necessary rules will be set upon deployment for them to interact. Conversely, on a “Bring-Your-Own” or “BYO” model (further explained below), installation of these components will be taken care of by the user.

Installing Metrics Components

FSM can either provision Prometheus and Grafana instances at install time or FSM can connect to an existing Prometheus and/or Grafana instance. We call the latter pattern “Bring-Your-Own” or “BYO”. The sections below describe how to configure metrics by allowing FSM to automatically provision the metrics components and with the BYO method.

Automatic Provisioning

By default, both Prometheus and Grafana are disabled.

However, when configured with the --set=fsm.deployPrometheus=true flag, FSM installation will deploy a Prometheus instance to scrape the sidecar’s metrics endpoints. Based on the metrics scraping configuration set by the user, FSM will annotate pods part of the mesh with necessary metrics annotations to have Prometheus reach and scrape the pods to collect relevant metrics. The scraping configuration file defines the default Prometheus behavior and the set of metrics collected by FSM.

To install Grafana for metrics visualization, pass the --set=fsm.deployGrafana=true flag to the fsm install command. FSM provides a pre-configured dashboard that is documented in FSM Grafana dashboards.

 fsm install --set=fsm.deployPrometheus=true \
             --set=fsm.deployGrafana=true

Note: The Prometheus and Grafana instances deployed automatically by FSM have simple configurations that do not include high availability, persistent storage, or locked down security. If production-grade instances are required, pre-provision them and follow the BYO instructions on this page to integrate them with FSM.

Bring-Your-Own

Prometheus

The following section documents the additional steps needed to allow an already running Prometheus instance to poll the endpoints of an FSM mesh.

List of Prerequisites for BYO Prometheus
  • Already running an accessible Prometheus instance outside of the mesh.
  • A running FSM control plane instance, deployed without metrics stack.
  • We will assume having Grafana reach Prometheus, exposing or forwarding Prometheus or Grafana web ports and configuring Prometheus to reach Kubernetes API services is taken care of or otherwise out of the scope of these steps.
Configuration
  • Make sure the Prometheus instance has appropriate RBAC rules to be able to reach both the pods and Kubernetes API - this might be dependent on specific requirements and situations for different deployments:
- apiGroups: [""]
  resources: ["nodes", "nodes/proxy",  "nodes/metrics", "services", "endpoints", "pods", "ingresses", "configmaps"]
  verbs: ["list", "get", "watch"]
- apiGroups: ["extensions"]
  resources: ["ingresses", "ingresses/status"]
  verbs: ["list", "get", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
  • If desired, use the Prometheus Service definition to allow Prometheus to scrape itself:
annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "<API port for prometheus>" # Depends on deployment - FSM automatic deployment uses 7070 by default, controlled by `values.yaml`
  • Amend Prometheus’ configmap to reach the pods/Pipy endpoints. FSM automatically appends the port annotations to the pods and takes care of pushing the listener configuration to the pods for Prometheus to reach:
- job_name: 'kubernetes-pods'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: source_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: source_pod_name
  - regex: '(__meta_kubernetes_pod_label_app)'
    action: labelmap
    replacement: source_service
  - regex: '(__meta_kubernetes_pod_label_fsm_sidecar_uid|__meta_kubernetes_pod_label_pod_template_hash|__meta_kubernetes_pod_label_version)'
    action: drop
  - source_labels: [__meta_kubernetes_pod_controller_kind]
    action: replace
    target_label: source_workload_kind
  - source_labels: [__meta_kubernetes_pod_controller_name]
    action: replace
    target_label: source_workload_name
  - source_labels: [__meta_kubernetes_pod_controller_kind]
    action: replace
    regex: ^ReplicaSet$
    target_label: source_workload_kind
    replacement: Deployment
  - source_labels:
    - __meta_kubernetes_pod_controller_kind
    - __meta_kubernetes_pod_controller_name
    action: replace
    regex: ^ReplicaSet;(.*)-[^-]+$
    target_label: source_workload_name

Grafana

The following section assumes a Prometheus instance has already been configured as a data source for a running Grafana instance. Refer to the Prometheus and Grafana demo for an example on how to create and configure a Grafana instance.

Importing FSM Dashboards

FSM Dashboards are available through our repository, which can be imported as json blobs on the web admin portal.

Detailed instructions for importing FSM dashboards can be found in the Prometheus and Grafana demo. Refer to FSM Grafana dashboard for an overview of the pre-configured dashboards.

Metrics scraping

Metrics scraping can be configured using the fsm metrics command. By default, FSM does not configure metrics scraping for pods in the mesh. Metrics scraping can be enabled or disabled at namespace scope such that pods belonging to configured namespaces can be enabled or disabled for scraping metrics.

For metrics to be scraped, the following prerequisites must be met:

  • The namespace must be a part of the mesh, ie. it must be labeled with the flomesh.io/monitored-by label with an appropriate mesh name. This can be done using the fsm namespace add command.
  • A running service able to scrape Prometheus endpoints. FSM provides configuration for an automatic bringup of Prometheus; alternatively users can bring their own Prometheus.

To enable one or more namespaces for metrics scraping:

fsm metrics enable --namespace test
fsm metrics enable --namespace "test1, test2"

To disable one or more namespaces for metrics scraping:

fsm metrics disable --namespace test
fsm metrics disable --namespace "test1, test2"

Enabling metrics scraping on a namespace also causes the fsm-injector to add the following annotations to pods in that namespace:

prometheus.io/scrape: true
prometheus.io/port: 15010
prometheus.io/path: /stats/prometheus

Available Metrics

FSM exports metrics about the traffic within the mesh as well as metrics about the control plane.

Custom Pipy Metrics

To implement the SMI Metrics Specification, the Pipy proxy in FSM generates the following statistics for HTTP traffic

fsm_request_total: a counter metric that is self-incrementing with each proxy request. By querying this metric, you can see the success and failure rates of requests for the services in the mesh.

fsm_request_duration_ms: A histogram metric that indicates the duration of a proxy request in milliseconds. This metric is queried to understand the latency between services in the mesh.

Both metrics have the following labels.

source_kind: the Kubernetes resource type of the workload that generated the request, e.g. Deployment, DaemonSet, etc.

destination_kind: The Kubernetes resource type that processes the requested workload, e.g. Deployment, DaemonSet, etc.

source_name: The name of the Kubernetes that generated the requested workload.

destination_name: The name of the Kubernetes that processed the requested workload.

source_pod: the name of the pod in Kubernetes that generated the request.

destination_pod: the name of the pod that processed the request in Kubernetes.

source_namespace: the namespace in Kubernetes of the workload that generated the request.

destination_namespace: the namespace in Kubernetes of the workload that processed the request.

In addition, the fsm_request_total metric has a response_code tag that indicates the HTTP status code of the request, e.g. 200, 404, etc.

Control Plane

The following metrics are exposed in the Prometheus format by the FSM control plane components. The fsm-controller and fsm-injector pods have the following Prometheus annotation.

annotations:
   prometheus.io/scrape: 'true'
   prometheus.io/port: '9091'
MetricTypeLabelsDescription
fsm_k8s_api_event_countCounttype, namespaceNumber of events received from the Kubernetes API Server
fsm_proxy_connect_countGaugeNumber of proxies connected to FSM controller
fsm_proxy_reconnect_countCountIngressGateway defines the certificate specification for an ingress gateway
fsm_proxy_response_send_success_countCountproxy_uuid, identity, typeNumber of responses successfully sent to proxies
fsm_proxy_response_send_error_countCountproxy_uuid, identity, typeNumber of responses that errored when being set to proxies
fsm_proxy_config_update_timeHistogramresource_type, successHistogram to track time spent for proxy configuration
fsm_proxy_broadcast_event_countCountNumber of ProxyBroadcast events published by the FSM controller
fsm_proxy_xds_request_countCountproxy_uuid, identity, typeNumber of XDS requests made by proxies
fsm_proxy_max_connections_rejectedCountNumber of proxy connections rejected due to the configured max connections limit
fsm_cert_issued_countCountTotal number of XDS certificates issued to proxies
fsm_cert_issued_timeHistogramHistogram to track time spent to issue xds certificate
fsm_admission_webhook_response_totalCountkind, successTotal number of admission webhook responses generated
fsm_error_err_code_countCounterr_codeNumber of errcodes generated by FSM
fsm_http_response_totalCountcode, method, pathNumber of HTTP responses sent
fsm_http_response_durationHistogramcode, method, pathDuration in seconds of HTTP responses sent
fsm_feature_flag_enabledGaugefeature_flagRepresents whether a feature flag is enabled (1) or disabled (0)
fsm_conversion_webhook_resource_totalCountkind, success, from_version, to_versionNumber of resources converted by conversion webhooks
fsm_events_queuedGaugeNumber of events seen but not yet processed by the control plane
fsm_reconciliation_totalCountkindCounter of resource reconciliations invoked

Error Code Metrics

When an error occurs in the FSM control plane the ErrCodeCounter Prometheus metric is incremented for the related FSM error code. For the complete list of error codes and their descriptions, see FSM Control Plane Error Code Troubleshooting Guide.

The fully-qualified name of the error code metric is fsm_error_err_code_count.

Note: Metrics corresponding to errors that result in process restarts might not be scraped in time.

Query metrics from Prometheus

Before you begin

Ensure that you have followed the steps to run FSM Demo

Querying proxy metrics for request count

  1. Verify that the Prometheus service is running in your cluster
    • In kubernetes, execute the following command: kubectl get svc fsm-prometheus -n <fsm-namespace>. image
    • Note: <fsm-namespace> refers to the namespace where the fsm control plane is installed.
  2. Open up the Prometheus UI
    • Ensure you are in root of the repository and execute the following script: ./scripts/port-forward-prometheus.sh
    • Visit the following url http://localhost:7070 in your web browser
  3. Execute a Prometheus query
    • In the “Expression” input box at the top of the web page, enter the text: sidecar_cluster_upstream_rq_xx{sidecar_response_code_class="2"} and click the execute button
    • This query will return the successful http requests

Sample result will be: image

Visualize metrics with Grafana

List of Prerequisites for Viewing Grafana Dashboards

Ensure that you have followed the steps to run FSM Demo

Viewing a Grafana dashboard for service to service metrics

  1. Verify that the Prometheus service is running in your cluster
    • In kubernetes, execute the following command: kubectl get svc fsm-prometheus -n <fsm-namespace> image
  2. Verify that the Grafana service is running in your cluster
    • In kubernetes, execute the following command: kubectl get svc fsm-grafana -n <fsm-namespace> image
  3. Open up the Grafana UI
    • Ensure you are in root of the repository and execute the following script: ./scripts/port-forward-grafana.sh
    • Visit the following url http://localhost:3000 in your web browser
  4. The Grafana UI will request for login details, use the following default settings:
    • username: admin
    • password: admin
  5. Viewing Grafana dashboard for service to service metrics

FSM Service to Service Metrics dashboard will look like: image

FSM Grafana dashboards

FSM provides some pre-cooked Grafana dashboards to display and track services related information captured by Prometheus:

  1. FSM Data Plane

    • FSM Data Plane Performance Metrics: This dashboard lets you view the performance of FSM’s data plane image
    • FSM Service to Service Metrics: This dashboard lets you view the traffic metrics from a given source service to a given destination service image
    • FSM Pod to Service Metrics: This dashboard lets you investigate the traffic metrics from a pod to all the services it connects/talks to image
    • FSM Workload to Service Metrics: This dashboard provides the traffic metrics from a workload (deployment, replicaSet) to all the services it connects/talks to image
    • FSM Workload to Workload Metrics: This dashboard displays the latencies of requests in the mesh from workload to workload image
  2. FSM Control Plane

    • FSM Control Plane Metrics: This dashboard provides traffic metrics from the given service to FSM’s control plane image
    • Mesh and Pipy Details: This dashboard lets you view the performance and behavior of FSM’s control plane image

2 - Tracing

Tracing with Jaeger

FSM allows optional deployment of Jaeger for tracing. Similarly, tracing can be enabled and customized during installation (tracing section in values.yaml) or at runtime by editing the fsm-mesh-config custom resource. Tracing can be enabled, disabled and configured at any time to support BYO scenarios.

When FSM is deployed with tracing enabled, the FSM control plane will use the user-provided tracing information to direct the Pipy to send traces when and where appropriate. If tracing is enabled without user-provided values, it will use the defaults in values.yaml. The tracing-address value tells all Pipy injected by FSM the FQDN to send tracing information to.

FSM supports tracing with applications that use Zipkin protocol.

Jaeger

Jaeger is an open source distributed tracing system used for monitoring and troubleshooting distributed systems. It allows you to get fine-grained metrics and distributed tracing information across your setup so that you can observe which microservices are communicating, where requests are going, and how long they are taking. You can use it to inspect for specific requests and responses to see how and when they happen.

When tracing is enabled, Jaeger is capable of receiving spans from Pipy in the mesh that can then be viewed and queried on Jaeger’s UI via port-forwarding.

FSM CLI offers the capability to deploy a Jaeger instance with FSM’s installation, but bringing your own managed Jaeger and configuring FSM’s tracing to point to it later is also supported.

Automatically Provision Jaeger

By default, Jaeger deployment and tracing as a whole is disabled.

A Jaeger instance can be automatically deployed by using the --set=fsm.deployJaeger=true FSM CLI flag at install time. This will provision a Jaeger pod in the mesh namespace.

Additionally, FSM has to be instructed to enable tracing on the proxies; this is done via the tracing section on the MeshConfig.

The following command will both deploy Jaeger and configure the tracing parameters according to the address of the newly deployed instance of Jaeger during FSM installation:

fsm install --set=fsm.deployJaeger=true,fsm.tracing.enable=true

This default bring-up uses the All-in-one Jaeger executable that launches the Jaeger UI, collector, query, and agent.

BYO (Bring-your-own)

This section documents the additional steps needed to allow an already running instance of Jaeger to integrate with your FSM control plane.

NOTE: This guide outlines steps specifically for Jaeger but you may use your own tracing application instance with applicable values. FSM supports tracing with applications that use Zipkin protocol

Prerequisites

Tracing Values

The sections below outline how to make required updates depending on whether you already already have FSM installed or are deploying tracing and Jaeger during FSM installation. In either case, the following tracing values in values.yaml are being updated to point to your Jaeger instance:

  1. enable: set to true to tell the Pipy connection manager to send tracing data to a specific address (cluster)
  2. address: set to the destination cluster of your Jaeger instance
  3. port: set to the destination port for the listener that you intend to use
  4. endpoint: set to the destination’s API or collector endpoint where the spans will be sent to

a) Enable tracing after FSM control plane has already been installed

If you already have FSM running, tracing values must be updated in the FSM MeshConfig using:

# Tracing configuration with sample values
kubectl patch meshconfig fsm-mesh-config -n fsm-system -p '{"spec":{"observability":{"tracing":{"enable":true,"address": "jaeger.fsm-system.svc.cluster.local","port":9411,"endpoint":"/api/v2/spans"}}}}'  --type=merge

You can verify these changes have been deployed by inspecting the fsm-mesh-config resource:

kubectl get meshconfig fsm-mesh-config -n fsm-system -o jsonpath='{.spec.observability.tracing}{"\n"}'

b) Enable tracing at FSM control plane install time

To deploy your own instance of Jaeger during FSM installation, you can use the --set flag as shown below to update the values:

fsm install --set fsm.tracing.enable=true,fsm.tracing.address=<tracing server hostname>,fsm.tracing.port=<tracing server port>,fsm.tracing.endpoint=<tracing server endpoint>

View the Jaeger UI with Port-Forwarding

Jaeger’s UI is running on port 16686. To view the web UI, you can use kubectl port-forward:

fsm_POD=$(kubectl get pods -n "$K8S_NAMESPACE" --no-headers  --selector app=jaeger | awk 'NR==1{print $1}')

kubectl port-forward -n "$K8S_NAMESPACE" "$fsm_POD"  16686:16686

Navigate to http://localhost:16686/ in a web browser to view the UI.

Example of Tracing with Jaeger

This section walks through the process of creating a simple Jaeger instance and enabling tracing with Jaeger in FSM.

  1. Run the FSM Demo with Jaeger deployed. You have two options:

    • For automatic provisioning of Jaeger, simply set DEPLOY_JAEGER in your .env file to true

    • For bring-your-own, you can deploy the sample instance provided by Jaeger using the commands below. If you wish to bring up Jaeger in a different namespace, make sure to update it below.

      Create the Jaeger service.

      kubectl apply -f - <<EOF
      ---
      kind: Service
      apiVersion: v1
      metadata:
        name: jaeger
        namespace: fsm-system
        labels:
          app: jaeger
      spec:
        selector:
          app: jaeger
        ports:
        - protocol: TCP
          # Service port and target port are the same
          port: 9411
        type: ClusterIP
      EOF
      

      Create the Jaeger deployment.

      kubectl apply -f - <<EOF
      ---
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: jaeger
        namespace: fsm-system
        labels:
          app: jaeger
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: jaeger
        template:
          metadata:
            labels:
              app: jaeger
          spec:
            containers:
            - name: jaeger
              image: jaegertracing/all-in-one
              args:
                - --collector.zipkin.host-port=9411
              imagePullPolicy: IfNotPresent
              ports:
              - containerPort: 9411
              resources:
                limits:
                  cpu: 500m
                  memory: 512M
                requests:
                  cpu: 100m
                  memory: 256M
      EOF
      
  2. Enable tracing and pass in applicable values. If you have installed Jaeger in a different namespace, replace fsm-system below.

    kubectl patch meshconfig fsm-mesh-config -n fsm-system -p '{"spec":{"observability":{"tracing":{"enable":true,"address": "jaeger.fsm-system.svc.cluster.local","port":9411,"endpoint":"/api/v2/spans"}}}}'  --type=merge
    
  3. Refer to instructions above to view the web UI using port forwarding

  4. In the browser, you should see a Service dropdown which allows you to select from the various applications deployed by the bookstore demo.

    a) Select a service to view all spans from it. For example, if you select bookbuyer with a Lookback of one hour, you can see its interactions with bookstore-v1 and bookstore-v2 sorted by time.

    Jaeger UI search for bookbuyer traces

    b) Click on any item to view it in further detail

    c) Select multiple items to compare traces. For example, you can compare the bookbuyer’s interactions with bookstore-v1 and bookstore-v2 at a particular moment in time:

    bookbuyer interactions with bookstore-v1 and bookestore-v2

    d) Click on the System Architecture tab to view a graph of how the various applications have been interacting/communicating. This provides an idea of how traffic is flowing between the applications.

    Directed acyclic graph of bookstore demo application interactions

If you are not seeing the bookstore demo applications in the Jaeger UI, tail the bookbuyer logs to ensure that the applications are successfully interacting.

POD="$(kubectl get pods -n "$BOOKBUYER_NAMESPACE" --show-labels --selector app=bookbuyer --no-headers | grep -v 'Terminating' | awk '{print $1}' | head -n1)"

kubectl logs "${POD}" -n "$BOOKBUYER_NAMESPACE" -c bookbuyer --tail=100 -f

Expect to see:

"MAESTRO! THIS TEST SUCCEEDED!"

This suggests that the issue is not caused by your Jaeger or tracing configuration.

Integrate Jaeger Tracing In Your Application

Jaeger tracing does not come effort-free. In order for Jaeger to connect requests to traces automatically, it is the application’s responsibility to publish the tracing information correctly.

In Open Service Mesh’s sidecar proxy configuration, currently Zipkin is used as the HTTP tracer. Therefore an application can leverage Zipkin supported headers to provide tracing information. In the initial request of a trace, the Zipkin plugin will generate the required HTTP headers. An application should propagate the headers below if it needs to add subsequent requests to the current trace:

  • x-request-id
  • x-b3-traceid
  • x-b3-spanid
  • x-b3-parentspanid

Troubleshoot Tracing/Jaeger

When tracing is not working as expected.

1. Verify that tracing is enabled

Ensure the enable key in the tracing configuration is set to true:

kubectl get meshconfig fsm-mesh-config -n fsm-system -o jsonpath='{.spec.observability.tracing.enable}{"\n"}'
true

2. Verify the tracing values being set are as expected

If tracing is enabled, you can verify the specific address, port and endpoint being used for tracing in the fsm-mesh-config resource:

kubectl get meshconfig fsm-mesh-config -n fsm-system -o jsonpath='{.spec.observability.tracing}{"\n"}'

To verify that the Pipy point to the FQDN you intend to use, check the value for the address key.

3. Verify the tracing values being used are as expected

To dig one level deeper, you may also check whether the values set by the MeshConfig are being correctly used. Use the command below to get the config dump of the pod in question and save the output in a file.

fsm proxy get config_dump -n <pod-namespace> <pod-name> > <file-name>

Open the file in your favorite text editor and search for pipy-tracing-cluster. You should be able to see the tracing values in use. Example output for the bookbuyer pod:

"name": "pipy-tracing-cluster",
      "type": "LOGICAL_DNS",
      "connect_timeout": "1s",
      "alt_stat_name": "pipy-tracing-cluster",
      "load_assignment": {
       "cluster_name": "pipy-tracing-cluster",
       "endpoints": [
        {
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "jaeger.fsm-system.svc.cluster.local",
              "port_value": 9411
        [...]

4. Verify that the FSM Controller was installed with Jaeger automatically deployed [optional]

If you used automatic bring-up, you can additionally check for the Jaeger service and Jaeger deployment:

# Assuming FSM is installed in the fsm-system namespace:
kubectl get services -n fsm-system -l app=jaeger

NAME     TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
jaeger   ClusterIP   10.99.2.87   <none>        9411/TCP   27m
# Assuming FSM is installed in the fsm-system namespace:
kubectl get deployments -n fsm-system -l app=jaeger

NAME     READY   UP-TO-DATE   AVAILABLE   AGE
jaeger   1/1     1            1           27m

5. Verify Jaeger pod readiness, responsiveness and health

Check if the Jaeger pod is running in the namespace you have deployed it in:

The commands below are specific to FSM’s automatic deployment of Jaeger; substitute namespace and label values for your own tracing instance as applicable:

kubectl get pods -n fsm-system -l app=jaeger

NAME                     READY   STATUS    RESTARTS   AGE
jaeger-8ddcc47d9-q7tgg   1/1     Running   5          27m

To get information about the Jaeger instance, use kubectl describe pod and check the Events in the output.

kubectl describe pod -n fsm-system -l app=jaeger

External Resources

3 - Logs

Diagnostic logs from the FSM control plane

FSM control plane components log diagnostic messages to stdout to aid in managing a mesh.

In the logs, users can expect to see the following kinds of information alongside messages:

  • Kubernetes resource metadata, like names and namespaces
  • mTLS certificate common names

FSM will not log sensitive information, such as:

  • Kubernetes Secret data
  • entire Kubernetes resources

Verbosity

Log verbosity controls when certain log messages are written, for example to include more messages for debugging or to include fewer messages that only point to critical errors.

FSM defines the following log levels in order of increasing verbosity:

Log levelPurpose
disabledDisables logging entirely
panicCurrently unused
fatalFor unrecoverable errors resulting in termination, usually on startup
errorFor errors that may require user action to resolve
warnFor recovered errors or unexpected conditions that may lead to errors
infoFor messages indicating normal behavior, such as acknowledging some user action
debugFor extra information useful in figuring out why a mesh may not be working as expected
traceFor extra verbose messages, used primarily for development

Each of the above log levels can be configured in the MeshConfig at spec.observability.fsmLogLevel or on install with the fsm.controllerLogLevel chart value.

Fluent Bit

When enabled, Fluent Bit can collect these logs, process them and send them to an output of the user’s choice such as Elasticsearch, Azure Log Analytics, BigQuery, etc.

Fluent Bit is an open source log processor and forwarder which allows you to collect data/logs and send them to multiple destinations. It can be used with FSM to forward FSM controller logs to a variety of outputs/log consumers by using its output plugins.

FSM provides log forwarding by optionally deploying a Fluent Bit sidecar to the FSM controller using the --set=fsm.enableFluentbit=true flag during installation. The user can then define where FSM logs should be forwarded using any of the available Fluent Bit output plugins.

Configuring Log Forwarding with Fluent Bit

By default, the Fluent Bit sidecar is configured to simply send logs to the Fluent Bit container’s stdout. If you have installed FSM with Fluent Bit enabled, you may access these logs using kubectl logs -n <fsm-namespace> <fsm-controller-name> -c fluentbit-logger. This command will also help you find how your logs are formatted in case you need to change your parsers and filters.

Note: <fsm-namespace> refers to the namespace where the fsm control plane is installed.

To quickly bring up Fluent Bit with default values, use the --set=fsm.enableFluentbit option:

fsm install --set=fsm.enableFluentbit=true

By default, logs will be filtered to emit info level logs. You may change the log level to “debug”, “warn”, “fatal”, “panic”, “disabled” or “trace” during installation using --set fsm.controllerLogLevel=<desired log level> . To get all logs, set the log level to trace.

Once you have tried out this basic setup, we recommend configuring log forwarding to your preferred output for more informative results.

To customize log forwarding to your output, follow these steps and then reinstall FSM with Fluent Bit enabled.

  1. Find the output plugin you would like to forward your logs to in Fluent Bit documentation. Replace the [OUTPUT] section in fluentbit-configmap.yaml with appropriate values.

  2. The default configuration uses CRI log format parsing. If you are using a kubernetes distribution that causes your logs to be formatted differently, you may need to add a new parser to the [PARSER] section and change the parser name in the [INPUT] section to one of the parsers defined here.

  3. Explore available Fluent Bit Filters and add as many [FILTER] sections as desired.

    • The [INPUT] section tags ingested logs with kube.* so make sure to include Match kube.* key/value pair in each of your custom filters.
    • The default configuration uses a modify filter to add a controller_pod_name key/value pair to help you query logs in your output by refining results on pod name (see example usage below).
  4. For these changes to take effect, run:

    make build-fsm
    
  5. Once you have updated the Fluent Bit ConfigMap template, you can deploy Fluent Bit during FSM installation using:

    fsm install --set=fsm.enableFluentbit=true [--set fsm.controllerLogLevel=<desired log level>]
    

    You should now be able to interact with error logs in the output of your choice as they get generated.

Example: Using Fluent Bit to send logs to Azure Monitor

Fluent Bit has an Azure output plugin that can be used to send logs to an Azure Log Analytics workspace as follows:

  1. Create a Log Analytics workspace

  2. Navigate to your new workspace in Azure Portal. Find your Workspace ID and Primary key in your workspace under Agents management. In values.yaml, under fluentBit, update the outputPlugin to azure and keys workspaceId and primaryKey with the corresponding values from Azure Portal (without quotes). Alternatively, you may replace entire output section in fluentbit-configmap.yaml as you would for any other output plugin.

  3. Run through steps 2-5 above.

  4. Once you run FSM with Fluent Bit enabled, logs will populate under the Logs > Custom Logs section in your Log Analytics workspace. There, you may run the following query to view most recent logs first:

    fluentbit_CL
    | order by TimeGenerated desc
    
  5. Refine your log results on a specific deployment of the FSM controller pod:

    | where controller_pod_name_s == "<desired fsm controller pod name>"
    

Once logs have been sent to Log Analytics, they can also be consumed by Application Insights as follows:

  1. Create a Workspace-based Application Insights instance.

  2. Navigate to your instance in Azure Portal. Go to the Logs section. Run this query to ensure that logs are being picked up from Log Analytics:

    workspace("<your-log-analytics-workspace-name>").fluentbit_CL
    

You can now interact with your logs in either of these instances.

Note: Fluent Bit is not currently supported on OpenShift.

Configuring Outbound Proxy Support for Fluent Bit

You may require outbound proxy support if your egress traffic is configured to go through a proxy server. There are two ways to enable this.

If you have already built FSM with the MeshConfig changes above, you can simply enable proxy support using the FSM CLI, replacing your values in the command below:

fsm install --set=fsm.enableFluentbit=true,fsm.fluentBit.enableProxySupport=true,fsm.fluentBit.httpProxy=<http-proxy-host:port>,fsm.fluentBit.httpsProxy=<https-proxy-host:port>

Alternatively, you may change the values in the Helm chart by updating the following in values.yaml:

  1. Change enableProxySupport to true

  2. Update the httpProxy and httpsProxy values to "http://<host>:<port>". If your proxy server requires basic authentication, you may include its username and password as: http://<username>:<password>@<host>:<port>

  3. For these changes to take effect, run:

    make build-fsm
    
  4. Install FSM with Fluent Bit enabled:

    fsm install --set=fsm.enableFluentbit=true
    

NOTE: Ensure that the Fluent Bit image tag is 1.6.4 or greater as it is required for this feature.