A security researcher revealed today he found three misconfigured Amazon S3 servers belonging to the US Department of Defense (DOD) containing 1.8 billion social media and forum posts made by users from all over the world, including many by Americans.
Discovered by UpGuard security researcher Chris Vickery, the databases were named "centcom-backup," "centcom-archive," and "pacom-archive."
Based on their names, it was obvious the databases belonged to US Central Command (CENTCOM) and US Pacific Command (PACOM), two of the DOD's military command operations.
According to the researcher, the data contained within the databases did not include any sensitive details. Instead, the databases were assembled by scraping the Internet for publicly available social media posts, forum posts, blogs, news comments, and similar postings.
The scraped data contained the post itself and data to identify the poster. Most of the scraped content Vickery found was written in multiple languages, but mostly in Arabic, Farsi, and English, and was collected between 2009 and up until August 2017.
Based on the data's structure inside these databases, they appeared to be part of a hybrid Lucene-Elasticsearch search engine.
According to Vickery's assessment, the databases appeared to have been put together by the US army's intelligence unit in an attempt to mine the Internet for information they might use for operations.
A folder labeled "Outpost" found on one of the CENTCOM-labeled S3 buckets appears to be the work of a former software vendor named VendorX, a former DOD contractor and a maker of big data search engine technology.
After finding the database, Vickery contacted the DOD in September, and the databases were secured soon after.
The databases were not publicly accessible, instead, they required a user to have an Amazon AWS account. A free account would have been enough to access and download the data stored in the three S3 buckets.
Last week, Amazon updated the AWS backend panel and added visible warnings when S3 servers are exposed online. The company took this decision after many companies had misconfigured S3 servers and accidentally exposed sensitive data.
Some might criticize the Pentagon for collecting social media posts from US citizens as part of "a secret surveillance program," but scraping the Internet is not against the law, and some private companies make a good living off such practices, sometimes selling the information back to governments in need of social media and Internet monitoring. The problem here is not the Internet scraping, but the army's inability to keep its third-party contractors in check and make sure data isn't leaking online.