Monitoring XML servers
Introduction
Monitoring of the XML server is available through:
- The XML server statistics, described in Monitoring XML server statistics
- The JVM that the XML server runs on, described in Monitoring the XML server JVM
- Health checks, described in XML server health checks
Monitoring XML server statistics
SPM XML servers produce statistics for each job processed. For more information about XML server architecture, see XML server architecture in the Social Program Management XML Infrastructure Guide.
Out-of-the-box these statistics are written to text files in the /opt/ibm/Curam/xmlserver/stats/
folder but are not directly accessible to Prometheus or other monitoring tools.
This topic describes a sample implementation to make XML server statistics available to Prometheus.
The ability to access XML server statistics in a Kubernetes deployment requires using the XML server -forcestatswrite
start option.
This option is only available in SPM V8.0.1 or higher. For more information about Statistics and starting the XML server, see Statistics and Starting the XML server in the Social Program Management XML Infrastructure Guide.
Sample implementation
The sample implementation is based on a proof-of-concept performed to illustrate the capability. Artifacts used for the proof-of-concept are provided as samples in the GitHub repo.
Notable decisions and limitations used in performing the proof-of-concept:
- The Prometheus client_java is used for implementing the Prometheus collector and exporter, hence it is coded in Java.
- A sidecar container is used to run the Prometheus collector and exporter as this allows for:
- Avoiding making any Kubernetes or Prometheus-specific code changes to the XML server code base. This is achieved by the use of persistent storage, which is described in more detail below
- Avoiding additional processes in the xmlserver pod, which could impact on the main xmlserver process.
At a high-level, providing XML server statistics to Prometheus involves the following steps:
- Implementing a Prometheus collector and exporter to make the
/opt/ibm/Curam/xmlserver/stats/ThreadPoolWorker-*
file contents available to Prometheus - Creating a Docker image for the sidecar container that contains the Prometheus collector and exporter
- Defining a sidecar container in the
xmlserver
deployment - Defining the necessary Prometheus configuration for the Kubernetes cluster. The sample deployment provided utilizes the OpenShift PodMonitor.
The role of persistent storage in monitoring XML server statistics
Persistent storage is required by this sample implementation so that:
- The
xmlserver-metrics
sidecar container has access to the same persistent file system as thexmlserver
container. - As the XML server writes statistics to the files in its
./stats
folder those same statistics records are available to thexmlserver-metrics
sidecar. - No changes are required to the XML server code base or the
xmlserver
container.
The following describes the files involved in the use of persistent storage and how they interact:
- For the
xmlserver-metrics
containerdockerfiles/Liberty/xmlserver-metrics/XMLServer-metrics.Dockerfile
creates the/opt/ibm/Curam/xmlserver/
folder and the./tmp
and./stats
subfolders in its local file system. - In
helm-charts/xmlserver/templates/deployment.yaml
thexmlserver-metrics
container defines the samevolumeMounts:
as thexmlserver
container and they both reference the deployment’spersistentVolumeClaim:
. - In the
xmlserver-metrics
container thestart-prometheus_collector_exporter.sh
startup shell script ensures the persistent mountpoint is available and creates symbolic links from the container’s file system to the persistent file system.
More details on the above files can be found in the relevant sections that follow.
Prometheus collector and exporter
This section describes the Prometheus collector and exporter code provided in the xmlserver_prometheus.jar
and xmlserver_prometheus-sources.jar
files that reside in the dockerfiles/Liberty/xmlserver-metrics/
folder.
You may find the following information useful in understanding the sample implementation and how to develop an implementation suitable for your specific environment and use of the XML server.
Choosing XML server statistics
For more information about Statistics and starting the XML server, see Statistics in the Social Program Management XML Infrastructure Guide.
Statistic | Description |
---|---|
Success | true , false (failed) |
Job preview type | PDF , HTML , TEXT , RTF |
Elapsed connection | elapsed time (milliseconds) from connection start to connection close |
Elapsed job | job run time (milliseconds) |
Elapsed job preview send | time (milliseconds) to send the preview data to the client |
Job preview data length | length of the preview data (in bytes) sent to the client |
Timestamp | timestamp (Java timestamp value) when the secure connection entered the system |
Template ID | template ID processed |
Template version | template version number |
Template locale | locale of the template |
For purposes of the proof-of-concept we used these statistics as being illustrative:
- Success - reported as success or failure
- Job type - PDF, HTML, TEXT, and RTF
- Elapsed connection
- Job preview data length
Your own requirements, depending on your application usage, etc. may differ.
Coding the Prometheus metric variables
These Prometheus client_java APIs illustrate coding of the above proof-of-concept statistics:
Jobs run:
/** Prometheus counter - of xmlserver jobs. */private static final Counter jobTotals = Counter.build().namespace("curam_xmlserver").name("jobs_total").help("Job counts with labels for status=success|fail ; type=PDF|HTML|TEXT|RTF").labelNames("status", "type").register();
Job elapsed, or connect, time:
/** Prometheus histogram - job connect time. */static final Histogram jobConnectTime = Histogram.build().namespace("curam_xmlserver").name("job_connect_seconds").help("Job connect time").buckets(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0).register();
Job bytes processed:
/** Prometheus histogram - job data length. */static final Histogram jobDataLength = Histogram.build().namespace("curam_xmlserver").name("job_data_length_bytes").help("Job preview data length").buckets(5000, 7000, 9000, 10000, 12000, 14000, 15000,20000, 25000, 30000, 35000, 50000).register();
Java code for the Prometheus collector and exporter
This is a high-level, pseudo-code description of the Java implementation of the Prometheus collector and exporter:
main()
method:
- Process optional arguments
-httpPort <port>
- Port that Prometheus will listen on. Defaults to:8080
-checkInterval <seconds>
- Number of seconds between checks for data in the./stats/
folder to check for termination. Defaults to:10
- Start the collector thread
- Start the exporter thread
Collector thread:
- Identify all
ThreadPoolWorker-*
files in the./stats
folder - While the thread is not interrupted
- Iterate over each file
- If there is content
- If the content has not been previously read
- Create a
java.io.RandomAccessFile
in read mode - Bypass previously read content
- Process each new line in the file
- Call the
io.prometheus.client.Counter.inc()
method to increment status (success
|fail
) and type (PDF
|HTML
|TEXT
|RTF
) - Call the
io.prometheus.client.Histogram.observe(double amt)
method for elapsed time - Call the
io.prometheus.client.Histogram.observe(double amt)
method for bytes processed
- Call the
- Record the new number of bytes read from the file
- Create a
- If the content has not been previously read
- Sleep for the specified interval
- If there is content
- Iterate over each file
Exporter thread:
- Using the specified port create a new
io.prometheus.client.exporter.HTTPServer.HTTPServer(int port)
Creating the Docker image for the sidecar container
In support of building the sidecar Docker image the following files are included in the dockerfiles/Liberty/xmlserver-metrics
folder of the GitHub repo:
- The
.Dockerfile
that builds the image:XMLServer-metrics.Dockerfile
- The sample proof-of-concept implementation executable,
xmlserver_prometheus.jar
, and source,xmlserver_prometheus-sources.jar
- The start shell script,
./xmlserver-metrics/start-prometheus_collector_exporter.sh
, invoked by thexmlserver-metrics
deployment’scmd:
structure and when invoked:- Ensures the persistent storage mount point is available and the expected directories exist.
- Starts the Prometheus collector/exporter Java program in the background.
- Waits for the Java program to terminate and logs the end of the process.
- The stop shell script,
./xmlserver-metrics/stop-prometheus_collector_exporter.sh
, invoked by thexmlserver-metrics
deployment’slifecycle:
structure and when invoked:- Cleans up the
.pid
file created when the Java process was started.
- Cleans up the
Building the Docker image
Building the xmlserver-metrics
image uses a similar process to building the other Docker images.
To build the sidecar image run the following commands:
cd $SPM_HOME/dockerfiles/Liberty/docker build \--tag $DOCKER_REGISTRY/$PROJECT/xmlserver-metrics:latest \--build-arg "BASE_REGISTRY=registry.connect.redhat.com/" \--file xmlserver-metrics/XMLServer-metrics.Dockerfile .
cd $SPM_HOME/dockerfiles/Liberty/docker build \--tag $DOCKER_REGISTRY/$PROJECT/xmlserver-metrics:latest \--file xmlserver-metrics/XMLServer-metrics.Dockerfile .
cd $env:SPM_HOME\dockerfiles\Liberty\docker build `--tag $env:DOCKER_REGISTRY/$env:PROJECT/xmlserver-metrics:latest `--build-arg "BASE_REGISTRY=registry.redhat.io/" `--file xmlserver-metrics\XMLServer-metrics.Dockerfile .
cd $env:SPM_HOME\dockerfiles\Liberty\docker build `--tag $env:DOCKER_REGISTRY/$env:PROJECT/xmlserver-metrics:latest `--file xmlserver-metrics\XMLServer-metrics.Dockerfile .
Deploying the sidecar container and configuring for Prometheus
Sample files are provided to define the sidecar and Prometheus configuration and xmlserver-metrics
is only installed by Helm when these settings are provided:
global.apps.common.persistence.enabled: true
global.useBetaFeatures: true
xmlserver.metrics.enabled: true
For more information see the XML Server Configuration Reference.
The following files are included in the helm-charts/xmlserver/
folder of the GitHub repo for deploying the xmlserver-metrics
sidecar container and configuring it for the XML server:
./templates/deployment.yaml
- defines thexmlserver-metrics
sidecar./templates/monitor-xmlserver.yaml
- defines aPodMonitor
The Prometheus port, xmlmetrics
, referenced in both deployment files defaults to 8080
.
Confirming the status of the xmlserver-metrics container
Once deployed, a Kubernetes get pods
command will show how many of the containers are ready. For the xmlserver
deployment it should show 2/2
(two of two) ready.
For deployment issues use the Kubernetes describe
command against the pod to confirm the deployment of the xmlserver-metrics
container.
You can view the xmlserver-metrics
stdout using the Kubernetes logs
command and you must use -c xmlserver-metrics
on the command to specify the container; otherwise, it defaults to the xmlserver
container.
While the xmlserver-metrics
container is running log4j output is written to StatsForPrometheus.log
in the pod’s persistent tmp
folder: /opt/ibm/Curam/xmlserver/tmp
.
Interacting with the XML server statistics
While ultimately you will want to view the XML server statistics in Prometheus or Grafana (not covered here), command line interactions are useful for testing and quickly getting performance indicators.
For instance:
- Using the
curl
command in a pod’s shell you can get the current metrics (remember that with two containers your Kubernetesexec
command must use-c xmlserver-metrics
to enter the correct shell), for example:curl -s http://localhost:8080/metrics
- displays all the Prometheus data and help textcurl -s http://localhost:8080/metrics | grep "curam_xmlserver_jobs_total.status=.success"
- displays the count of successful XML server jobscurl -s http://localhost:8080/metrics | grep "curam_xmlserver_jobs_total.status"
- displays the count of successful and failed jobs
Monitoring the XML server JVM
The performance of the Java Virtual Machine (JVM) running the XML server directly impacts the performance of the XML server.
The xmlserver
image includes the Prometheus JMX Exporter and you can configure your deployment to enable monitoring of the XML server JVM. Your Helm override file needs to specify the following to enable the monitoring:
global.useBetaFeatures: true
xmlserver.jvmStats.enabled: true
Additionally you can configure the port and JMX Exporter configuration. This is the default configuration:
xmlserver.jvmStats.port: 8083
xmlserver.jvmStats.configYaml: ''
How you configure the JMX Exporter will depend upon how you consume the data. For reference:
- See the Prometheus JMX Exporter configuration documentation
- See this sample Grafana JVM dashboard
XML server health checks
A readiness and liveness probe are available for xmlserver deployments. These probes complement each other to ensure the xmlserver service is not accessed until it is ready and remains available.
In support of the probes a folder, xmlserverprobes
has been added to the /opt/ibm/Curam/xmlserver
folder in xmlserver
containers.
To enable these probes you need to enable them in your Helm override file. Depending on your local environment you may need to adjust the timing values to achieve successful operation. See the XML Server Configuration Reference topic and the K8s Configure Liveness, Readiness and Startup Probes page for more information.
readinessProbe
The readinessProbe depends on this info level log4j message: ”XML Server awaiting connections on port
”. Thus, if you have modified the default xmlserver
log4j configuration to suppress this message you cannot use the readinessProbe.
livenessProbe
The livenessProbe adds messages such as these to the /opt/ibm/Curam/xmlserver/tmp/xmlserver.log
file for each probe invocation (default 30 seconds):
[xmlserver] [INFO] [XMLConnectionHandler] Mon Apr 03 12:28:02 BST 2023 Connection received.[xmlserver] [INFO] [XMLConnectionHandler] Awaiting job configuration.[xmlserver] [INFO] [XMLConnectionHandler] Probe check received.