What are the best-practices when it comes to setting up the fluentd buffer for a multi-tenant-scenario?
I have used the fluent-operator to setup a multi-tenant fluentbit and fluentd logging solution, where fluentbit collects and enriches the logs, and fluentd aggregates and ships them to AWS OpenSearch.
The operator uses a label router to separate logs from different tenants.
In my cluster, every time a new application is deployed via Helm chart, it applies the following resources:
apiVersion: fluentd.fluent.io/v1alpha1
kind: FluentdConfig
metadata:
name: -fluentd-config
labels:
config.fluentd.fluent.io/enabled: "true"
spec:
clusterFilterSelector:
matchLabels:
filter.fluentd.fluent.io/enabled: "true"
filter.fluentd.fluent.io/tenant: core
outputSelector:
matchLabels:
output.fluentd.fluent.io/enabled: "true"
output.fluentd.fluent.io/tenant:
watchedLabels:
---
apiVersion: fluentd.fluent.io/v1alpha1
kind: Output
metadata:
name: fluentd-output-
labels:
output.fluentd.fluent.io/tenant:
output.fluentd.fluent.io/enabled: "true"
spec:
outputs:
- customPlugin:
config: |
<match **>
@type opensearch
host "${FLUENT_OPENSEARCH_HOST}"
port 443
logstash_format true
logstash_prefix logs-
scheme https
log_os_400_reason true
@log_level ${FLUENTD_OUTPUT_LOGLEVEL:=info}
<buffer>
...
</buffer>
<endpoint>
url "https://${FLUENT_OPENSEARCH_HOST}"
region "${FLUENT_OPENSEARCH_REGION}"
assume_role_arn "#{ENV['AWS_ROLE_ARN']}"
assume_role_web_identity_token_file "#{ENV['AWS_WEB_IDENTITY_TOKEN_FILE']}"
</endpoint>
</match>
So, for every new application a new <match>
section will be created, and because of that a new buffer configuration for that application:
<ROOT>
<system>
rpc_endpoint "127.0.0.1:24444"
log_level info
workers 1
</system>
<source>
@type forward
bind "0.0.0.0"
port 24224
</source>
<match **>
@id main
@type label_router
<route>
@label "@c9ce9b26357ba0a190e4d01fbf7ef506"
<match>
labels app:app-name
namespaces app-namespace
</match>
</route>
<label @33b5ad9c15abdec648ede544d80f80f5>
<filter **>
@type dedot
de_dot_separator "_"
de_dot_nested true
</filter>
<match **>
@type opensearch
host "XXXX.us-west-2.es.amazonaws.com"
port 443
logstash_format true
logstash_prefix "logs-XXX"
scheme https
log_os_400_reason true
@log_level "info"
<buffer>
...
</buffer>
<endpoint>
url https://XXXX.us-west-2.es.amazonaws.com
region "us-west-2"
assume_role_arn "arn:aws:iam::XXX:role/raas/fluentd-os-access-us-west-2"
assume_role_web_identity_token_file "/var/run/secrets/eks.amazonaws.com/serviceaccount/token"
</endpoint>
</match>
</label>
<match **>
@type null
@id main-no-output
</match>
<label @FLUENT_LOG>
<match fluent.*>
@type null
@id main-fluentd-log
</match>
</label>
</ROOT>
To sum it up, I'll have a buffer for every pod that enables log collection in the Helm chart.
If I had to configure a single buffer for all the cluster I would use something like this:
<buffer>
@type memory
flush_mode interval
flush_interval FLUENTD_BUFFER_FLUSH_INTERVAL:=60s
flush_thread_count 1
retry_type exponential_backoff
retry_max_times 10
retry_wait 1s
retry_max_interval 60s
chunk_limit_size 8MB
total_limit_size 512MB
overflow_action throw_exception
compress gzip
</buffer>
This buffer was based on fluentd's documentation default values.
But this is obviously not scalable. I cannot have dozens or maybe even hundreds of applications/pods with the above buffer configuration because it would exhaust Fluentd resources.
How can I define a base "micro-buffer" that would be enough for most pods/applications?
No comments:
Post a Comment