This article is part of a series about APT campaigns. The other topics covered in this series are weaponization and delivery, exploitation and installation, command and control, and exfiltration.
In part one of a series on understanding the processes and tools behind an APT-based incident, CSO examines the reconnaissance aspect of an attacker's campaign. This is the first step of many, and often helps the attacker identify who to attack and how.
Personal Information: People are your weakest link
All too often, the information that harms an organization or person the most is something that wasn't viewed as important enough to protect to begin with. This can be anything from telephone or email directory listings, metadata within a document passed around online, to an executive's full name and corporate biography.
Some information can be discovered through public records and Web searches, but sometimes that isn't the case. The information about a person or organization that is found publically is called Open Source Intelligence (OSINT), because it is freely and publically available to anyone who knows how to find it. The problem is that for most people, the amount of OSINT available from a single source is usually rather scarce.
[New malware variant suggests cybercriminals targeting SAP users]
Chaining information – that is, taking smaller bits of data and keeping it together until you have a full profile – is commonplace for many criminals, because of this scarcity. The hacktivist collective known as Anonymous is legendary for their use of "doxing" to collect personal information on someone, or something, before launching an attack. These "dox" as Anons call them, are nothing more than information chains. Hacktivists and criminals aren't the only ones who do this however; security professionals do the same thing, including law enforcement.
Think about how much of the information available to the public via the Web would allow an attacker to get clearer picture of your organization and its employees. This includes data from business reports, news reports, the organization's website, social media accounts (personal and professional), as well as associative web information from business partners.
Put yourself in the attacker's shoes. And remember that at this point, it doesn't matter if they are working for a nation-state, or for themselves. With the collected information from a wide range of OSINT sources, they'll have a good idea of whom to target and why; and more importantly, they'll know how to approach these people, with little to no additional research or background information needed.
The attacker will have a good indication of the person's hobbies, where they went to school, addresses and other personal information, the types of social groups they belong to, and how they interact with peers. All if this is valuable information. Unfortunately, as previously mentioned, it is also information no one thinks to protect.
Speaking of data no one thinks to protect, let's look at metadata.
Metadata: Hidden keys to your corporate kingdom
In this context, metadata is the embedded information included with documents and images. We're not talking about the metadata that the NSA collects on a minute-to-minute basis. Most people are unaware that many of the pictures they upload to the Web contain not only the location where the image was taken, but also an accurate timestamp, as well as hardware information. When it comes to metadata within documents, from PDF to PowerPoint, and everything between, it is possible to learn software titles and versions, the document author's name, network locations, IP addresses, and more.
Understanding metadata is important, because the first thing an attacker is going to collect during the reconnaissance phase are the publically available documents produced by the target. Documents can easily be harvested and checked for sensitive metadata with a rather handy tool called FOCA (Fingerprinting Organizations with Collected Archives). It can be downloaded here. While attackers will use it for reconnaissance, good guys can use it too, as it's handy when assessing internal risk.
[The practicality of the Cyber Kill Chain approach to security]
Here's a good example of what metadata can serve up. In 2011, a 1.2 GB Torrent file published by someone representing Anonymous led many to believe the U.S. Chamber of Commerce, the American Legislative Exchange Council (ALEC), and the Michigan-based Mackinac Center for Public Policy, had suffered a data breach. A short time later, it was concluded that the documents were not stolen, but collected using FOCA.
Within the document set belonging to the U.S. Chamber of Commerce, there were 194 Word documents (.doc and .docx), 724 PDF files, 59 PowerPoint files (.ppt and .pptx), and 12 Excel files (.xls and .xlsx).
By examining the metadata within, 293 names were found, a majority of them representing network IDs. In addition 23 unique emails discovered, but given the exposed network naming conventions, working out the others will offer no challenge to an attacker creating a profile as many of the people representing the U.S. Chamber of Commerce are easily discovered via OSINT. The data also included folder paths, both internally on the network, as well as local system paths, and webserver paths. The location and name of shared network printers were also identified.
When it comes to software, the data from the U.S. Chamber of Commerce listed more than 100 unique titles. It's true that many of the identified software titles are based on when the document was created. Yet, given that many organizations sill keep legacy software in production, knowing that there are older versions of Microsoft Office, Adobe Reader, Acrobat Distiller, or Xerox WorkCentre software on the network, is valuable data for an attacker doing reconnaissance.
[APT malware NetTraveler learning new tricks]
Also of value is the knowledge of IP addresses, as well as proof that the organization was running Windows XP, Windows Server 2000, and Windows Server 2003 at the time the documents were created. Again, while some of the data is old, the massive amount of information exposed can be used as a starting point when targeting an organization.
While FOCA can help discover metadata, there are plenty of resources available to manage and eliminate it. A solid starting point are the recommendations from Microsoft and Adobe, as well as a technical note from the National Security Agency.
Technical Information: Pwning the infrastructure
While attackers will use OSINT to hunt for prospective marks, they'll also look at applications and scripts used by the target organization's website(s). Attackers will probe a target's entire network for flaws, so applications and scripts are not the only attack surface; they're just the easiest ones to access.
As mentioned, knowing the types of software used by the target is valuable (e.g., Office, Adobe Reader), but so are IP addresses, webserver specs (such as platform versions), webhosting information, and the types of hardware (e.g., routers and servers) used on the network.
Platform version numbers help the attacker identify existing vulnerabilities, but when it comes to hardware, this information can be used to locate default credentials. When it comes to scripts and website development, an attacker will passively scan for logic flaws, Cross-Site Scripting, SQL Injection, and other vulnerabilities.
Another avenue of technical reconnaissance is the supply chain. Many organizations often list their business partners publically, which to an attacker equates to another relationship to exploit. Think: If a reseller's account is compromised, how can that impact your organization?
[APT attackers getting more evasive, even more persistent]
At this point, it is rather clear that the opening salvo in an APT-based campaign is information gathering. This is a key difference between a targeted attack and the generic attacks that most organizations are subjected to day-to-day. Generic attacks work on volume, so an attacker doesn't care really who clicks a link or opens an attachment.
Sometimes the best summation is a simple checklist. Here's an outline of what an attacker will be looking for when it comes to reconnaissance operations.
OSINT Data (publically posted data from the target's domains)
- Downloadable documents
- These offer direct information as well as the chance to collect metadata.
- Employee images and corporate event images
- These offer direct information as well as the chance to collect metadata.
- Staff directories and leadership / management profiles
- Used to know who is who, and to establish relationships within the company.
- Projects and product data
- This data can be helpful when researching attack surfaces and background information.
- B2B Relationships
- This data is used to establish the supply chain relationships and sales channel for later exploitation if needed.
- Employee details
- This will include developing profiles with personal and public data from social media sources: such as Facebook, LinkedIn, Twitter, and blogs.
- Software data
- The types of software used within the targeted organization, including OS and third-party software; often collected via metadata
Building a rounded personal profile
A full profile on a person will include - Full name; address (past and present); phone numbers (personal and work); date of birth; Social Security Number; ISP data (IP address, provider); usernames; passwords; public records data (taxes, credit history, legal records); hobbies, favorite eateries, movies, books, and more.
While criminals will attempt to gather all of this data, the amount of profile information needed for a given campaign will be different in each case. However, the more there is, then the more leverage an attacker has when they make contact with someone. No piece of information is too small when building a profile.
Building a rounded technical profile
A technical profile for a targeted organization will include network maps, technical details obtained from metadata, IP addresses, available hardware and software information, operating system details, platform development data, and authentication measures such as how network IDs are created.
With this information, the attacker can use the personal profile data and target the helpdesk. By that token, knowing how IDs are created also helps establish how email addresses are created, making the task of Phishing, guessing addresses, or initial communication easier. When it comes to operating systems, third-party software, and platform data, the attacker can hunt for vulnerabilities or default access.
[Advanced persistent threats can be beaten, says expert]
Web applications are checked for common vulnerabilities including Cross-Site Scripting, SQL Injection, Remote or Local File Inclusion flaws, and logic flaws. Likewise, armed with B2B data, channel apps or partner-based apps are also checked for the same flaws. The idea here would be to exploit the supply chain in order to gain access to the targeted organization.
Data gathering resources:
As these sites are used to collect profile information during the reconnaissance phase, each new bit of information exposed leads to more searching, and more information for an attacker to leverage. Social media profiles lead to names and images.
People, no matter how private they are, leave something of themselves behind on the Internet. Many are unknowingly exposed thanks to public record searches, as well as massive indexes of information available for next to nothing via data brokers.
Attackers know where to look. Depending on the target, some attackers are bankrolled and will pay for information, or information services. Please note however, this is not an exhaustive list of resources, these are just the ones mentioned to CSO during various conversations.
Google (www.google.com)
This includes all of Google's data points. For example, Google Maps, Google Groups, Blogger, YouTube, and Google +. When searching for information, Google should always be the first stop.
People / Business Searches
These sites offer public information searches on people, businesses, and the connections between them. The best results will come from using all of them and creating two information chains. The first will hold all of the common data found on all of the indexes, and the other will be the data that didn't match up. The unmatched data should be checked for authenticity.
Zoom Info (www.zoominfo.com)
PIPL (www.pipl.com)
Intelius (www.intelius.com)
Muckety (www.muckety.com)
Other search resources
Web Archive ()
Also known as the Internet Archive, it can be used to discover older copies of comments, articles, websites, and profiles. This is useful for tracking someone or something over time.
GeoIP (www.geoiptool.com )
This is a basic IP Address mapping website, used if the target's IP is known. It can also be used to confirm location based on other collected data points.
Robtex (www.robtex.com)
Robtex is a useful search engine for mapping DNS data, domain information, and hosted route mapping. Often, this site is used to see what websites share the same IP addresses, or nameservers. This is useful when the attacker wants to compromise a domain hosted on the same server as the target's domain (shared hosting / co-lo environments), which will enable them with indirect access.
KnowEm (www.knowem.com)