Summary

If you have noticed slow services (e.g. Jira, Confluence, Jenkins, etc.) on Technology Nursery lately, I suspect the server was hit with an opportunistic malware attack that uses its CPU for bitcoin mining!  Here is what I did to identify and fix the problem.  

The malware has been removed.  However, additional work is needed to minimize the attack surface risk by backing up and upgrading Confluence to the latest version.

Symptom

Today, Confluence was very slow to respond.  Looking at Grafana and top, CPU utilization was at 100%.  The following processes where taking up most of the CPU:

  • dblaunchs (3x processes)
  • khugepageds

Problem

dblaunchs

A search for the symptom uncovered this strange behavior thread from the Atlassian Community:

Based on your symptoms, it sounds like your instance was affected by an opportunistic attack against the CVE-2019-3396 Widget Connector vulnerability from March 20th (see Confluence Security Advisory - 2019-03-20). We've seen an infection going around that injects malware and the bitcoin miner it tries to run uses all the CPU available on the box. Initially the kerberods malware was being deployed as the payload, but other attacks might be trying to inject different payloads.

I'd recommend tackling things in this order:

    1. Kill malicious processes
    2. Clean up your crontab
    3. Upgrade Confluence
    4. Use a malware scanner to find remaining malware traces

Malicious processes

The top command will help you find processes (probably running under the confluence user account) that are consuming a large amount of CPU. If Confluence is currently stopped, you can probably plan on killing any processes running as the confluence user. note the process ID (pid) from the top output and then kill the process using kill -9 followed by the pid. Example:

sudo kill -9 12395

Clean up your crontab

Since most malware adds a cronjob that relaunches the malware every few minutes, you'll also need to check the crontab file and remove any suspicious-looking entries. For Ubuntu, this is stored in the /var/spool/cron/crontabs/ directory. Normally you should use the crontab command to edit the crontab, but for cleanup purposes we'll be inspecting the file for any pre-existing entries.

Using vim (or whichever text editor you're comfortable with), you'll open the file and remove suspicious-looking jobs.

sudo vim /var/spool/cron/crontabs/confluence

Confluence comes up on system startup through the SysV/systemd daemons, so we would expect the confluence user's crontab to not exist under normal circumstances. It's most likely the case that any entries in this file are malicious, but make sure you check them before deleting them entirely.

Upgrade Confluence

Once your CPU is under control and new malicious process aren't spawning, you need to upgrade Confluence to a version that isn't affected by the vulnerability. I'd recommend looking at one of these versions (latest releases as of this post):

Use a malware scanner

Finally, you need to clean up any remaining traces of malware on your system. The LSD malware cleanup tool will be useful for removing the Kerberods malware. Other malware payloads might need different cleanup tools depending on which attack and payload were used. A good starting place for detecting other types of infections are the scanners linked here. Once a particular infection is identified, googling for "____ removal tool" is a good place to start if the scanner was unable to remove the malware automatically.

Please let me know if you have more questions!
Daniel | Atlassian Support

khugepageds

Stack Overflow has this thread on Jenkins High CPU Usage khugepageds.

Solution

  • Clean up any errant cron jobs.
    No errant cron jobs found.
  • Kill all of the errant processes.
    CPU came back down to normal, but crept back up as those processes were restarted by some cron job script.
  • Kill all errant processes and immediately restart server s14 (the server that hosts jenkins and confluence).
    This time, the processes stayed down.
  • Rebuild Jenkins image with the latest LTS version 2.164.2
  • Start Technology Nursery services: technologynursery.start
  • On Jenkins, upgrade all plugins
    CPU utilization has stayed normal (>2%) for the past two hours.  Confluence is responsive again.

Preventative Maintenance

  • Ralph A. Navarro Jr.  Backup and upgrade Confluence to the latest version as recommended by Atlassian.


  • No labels