Skip to main content

How to notify errors through td-agent immediately while only sending counts of certain acknowledged errors daily

💻 Tech

A short while ago, I implemented a slack notification function in our product’s td-agent.

Then I faced a problem. There are errors everyone acknowledges and do not need to be notified immediately.
We only want to know how many times the error occurred daily not to miss if it strangely happens a lot.

I made it possible by using rewrite, grepcounter and slack plugins.
Here’s the code how I overcame it in td-agent.conf file.

# Retrieve Error Log
<source>
  type tail
  path {{ path }}
  format multiline
  format_firstline /^\d{4}-\d{2}-\d{2}/
  format1 /^(?<text>.*)/
  tag raw.app.errorlog.{{ hostname }}
  pos_file /var/tmp/app_log.pos.slack
</source>

# Filter Acknowledged Error
<match raw.app.errorlog.{{ hostname }}>
  type rewrite
  add_prefix filtered
  <rule>
    key           text
    pattern       FileNotFoundException
    append_to_tag true
    tag           FileNotFoundException
  </rule>
</match>

# Notify Error to the Slack channel
<match filtered.raw.app.errorlog.{{ hostname }}>
  type slack
  webhook_url {{ webhook_url }}
  channel {{ channel }}
  username ERROR_NOTIFIER
  message '{{ td_agent_app_errorlog_mention }}```[host] {{ hostname }} [Path] {{ errorlog_path }}``` %s'
  message_keys text
  color warning
  flush_interval 10s
</match>

# Notify the count of Acknowledged Errors filtered above
<match filtered.raw.app.errorlog.{{ hostname }}.FileNotFoundException>
  type grepcounter
  count_interval 86400 # = 24 hours
  input_key text
  threshold 1
  add_tag_prefix count
</match>
<match count.filtered.raw.app.errorlog.{{ hostname }}.FileNotFoundException>
  type slack
  webhook_url {{ webhook_url }}
  channel {{ channel }}
  username EXISTING_ERROR_NOTIFIER
  icon_emoji :admission_tickets:
  message_keys count
  message '```FileNotFoundException occured %s times within this 24 hours at {{ hostname }}```'
  color #FFB6C1
  flush_interval 10
</match>

When an error happens, a notification like this immediately sends to a Slack channel.

application_errorlog

The count of the acknowledged error is notified like this everyday.

existing_error_notifier

It says how many times the errors occurred, in which host the error took place and the link to the Jira ticket describes the detail of it.

Tell me what you think of this article! 👉️ @curryisdrink