Map of exposed HDFS systems
Map of exposed HDFS systems [via Shodan]

Improperly configured HDFS-based servers, mostly Hadoop installs, are exposing over five petabytes of information, according to John Matherly, founder of Shodan, a search engine for discovering Internet-connected devices.

The expert says he discovered 4,487 instances of HDFS-based servers available via public IP addresses and without authentication, which in total exposed over 5,120 TB of data.

According to Matherly, 47,820 MongoDB servers exposed only 25 TB of data. To put things in perspective, HDFS servers leak 200 times more data compared to MongoDB servers, which are ten times more prevalent. A report from Binary Edge from 2015 revealed that at the time, Redis, MongoDB, Memcached, and ElasticSearch servers put together exposed a tota of only 1.1 PB of data.

Most HDFS systems are located in the US and China

The countries that exposed the most HDFS instances are by far the US and China, but this should be of no surprise as these two countries host over 50% of all data centers in the world.

Top 10 Countries:
+++++++++++++++++++++++++
US                  1,900
CN                  1,426
DE                  129
KR                  115
FR                  91
SG                  86
IE                  82
IN                  74
TW                  66
CA                  43

Earlier this year, attackers realized they could take over unprotected servers exposed online, steal their content, and demand a ransom. Attacks first targeted MongoDB, but they soon moved to CouchDB and Hadoop.

Initially, 124 Hadoop servers were ransomed, a number which eventually grew to almost 500. According to Matherly, there are still 207 HDFS-based clusters that still feature ransom demands, albeit it's unclear if these are leftovers from the January attacks, or servers are being hijacked even as we speak.

HDFS stands for Hadoop Distributed File System and is a distributed file system designed to run on cloud-based servers. It is the core technology that powers the Apache Hadoop technology, but it's also deployed with custom solutions. HDFS systems are usually found in cloud hosting environments and are used to store and process large quantities of user data.

Instructions on how to configure a Hadoop server to run in secure mode are available here. Matherly has published the necessary steps to reproduce his research.