8 top open source intelligence tools

Find sensitive public info before the bad guys do

man typing on laptop search internet web browswer
Getty Images

OSINT definition

OSINT, or open source intelligence, is the practice of collecting information from published or otherwise publicly available sources. OSINT operations, whether practiced by IT security pros, malicious hackers, or state-sanctioned intelligence operatives, use advanced techniques to search through the vast haystack of visible data to find the needles they're looking for to achieve their goals—and learn information that many don't realize is public. Open source in this context doesn't refer to the open source software movement, although many OSINT tools are open source; instead, it describes the public nature of the data being analyzed.

OSINT is in many ways the mirror image of operational security, or OPSEC, which is the security process by which organizations protect public data about themselves that could, if properly analyzed, reveal damaging truths. IT security departments are increasingly tasked with performing OSINT operations on their own organizations in order to shore up operational security.

OSINT history: From spycraft to IT

During the 1980s, the military and intelligence services began to shift some of their information-gathering activities away from covert activities like trying to read an adversary’s mail or tapping their phones to discover hidden secrets. Instead, effort was put into looking for useful intelligence that was freely available or even officially published.

The world at the time was changing, and even though social media had not yet made the scene, there were plenty of sources like newspapers and publicly available databases that contained interesting and sometimes useful information, especially if someone knew how to connect a lot of dots. The term OSINT was originally coined to refer to this kind of spycraft.

But these same techniques can now be applied to cybersecurity. Most organizations have vast, public-facing infrastructures that span many networks, technologies, hosting services and namespaces. Information can be stored on employee desktops, in legacy on-prem servers, with employee-owned BYOD devices, in the cloud, embedded inside devices like webcams, or even hidden in the source code of active apps and programs.

In fact, the IT staff at large companies almost never knows about every asset in their enterprise, public or not. Add in the fact that many organizations also own or control several additional assets indirectly, such as their social media accounts, and there is potentially a lot of information sitting out there that could be dangerous in the wrong hands.

Why is OSINT important?

OSINT is crucial in keeping tabs on that information chaos. There are three main important tasks within OSINT that IT needs to fulfill, and a wide range of OSINT tools have been developed to help meet those needs. Most tools serve all three functions, though many excel in one particular area.

Discovering public-facing assets
Their most common function is helping IT teams discover public facing assets and mapping what information each possesses that could contribute to a potential attack surface. In general, they don’t try to look for things like program vulnerabilities or perform penetration testing. Their main job is recording what information someone could publicly find on or about company assets without resorting to hacking.

Discover relevant information outside the organization
A secondary function that some OSINT tools perform is looking for relevant information outside of an organization, such as in social media posts or at domains and locations that might be outside of a tightly defined network. Organizations that have made a lot of acquisitions, bringing along the IT assets of the company they are merging with, could find this function very useful. Given the extreme growth and popularity of social media, looking outside the company perimeter for sensitive information is probably helpful for just about any group.

Collate discovered information into actionable form
Finally, some OSINT tools help to collate and group all the discovered information into useful and actionable intelligence. Running an OSINT scan for a large enterprise can yield hundreds of thousands of results, especially if both internal and external assets are included. Piecing all that data together and being able to deal with the most serious problems first can be extremely helpful.

Top OSINT tools

Using the right OSINT tool for your organization can improve cybersecurity by helping to discover information about your company, employees, IT assets and other confidential or sensitive data that could be exploited by an attacker. Discovering that information first and then hiding or removing it could reduce everything from phishing to denial-of-service attacks.

Following (in no particular order) are some of the top tools used for OSINT, what areas they specialize in, why they are unique and different from one another, and what specific value they might be able to bring to an organization’s cybersecurity efforts.

  • Maltego
  • Recon-ng
  • theHarvester
  • Shodan
  • Metagoofil
  • Searchcode
  • SpiderFoot
  • Babel X

Maltego
Maltego specializes in uncovering relationships among people, companies, domains and publicly accessible information on the internet. It’s also known for taking the sometimes enormous amount of discovered information and plotting it all out in easy-to-read charts and graphs. The graphs do a good job of taking raw intelligence and making it actionable, and each graph can have up to 10,000 data points.

The Maltego program works by automating the searching of different public data sources, so users can click on one button and execute multiple queries. A search plan is called a “transform action” by the program, and Maltego comes with quite a few by default that include common sources of public information like DNS records, whois records, search engines and social networks. Because the program is using public interfaces to perform its searching, it’s compatible with almost any source of information that has a public interface, so adding more searches to a transform action or making up a whole new one is easily possible.

Once the information is gathered, Maltego makes connections that can unmask the hidden relationships between names, email addresses, aliases, companies, websites, document owners, affiliations and other information that might prove useful in an investigation, or to look for potential future problems. The program itself runs in Java, so it works with Windows, Mac and Linux platforms.

There is a free version of the program with limited features called Maltego CE. Desktop versions of Maltego XL run $1,999 per instance. Server installations for large-scale commercial use start at $40,000 and come with a complete training program.

Recon-ng
Developers who work in Python have access to a powerful tool in Recon-ng, which is written in that language. Its interface looks very similar to the popular Metasploit Framework, which should reduce the learning curve for those who have experience with it. It also has an interactive help function, which many Python modules lack, so developers should be able to pick it up quickly.

Recon-ng automates time-consuming OSINT activities, like cutting and pasting. Recon-ng doesn't claim that all OSINT gathering can be conducted by its tool, but it can be used to automate much of the most popular kinds of harvesting, leaving more time for the things that still must be done manually.

Designed so that even the most junior Python developers can create searches of publicly available data and return good results, it has a very modular framework with a lot of built-in functionality. Common tasks like standardizing output, interacting with databases, making web requests and managing API keys are all part of the interface. Instead of programming Recon-ng to perform searches, developers simply choose which functions they want it to perform and build an automated module in just a few minutes.

Recon-ng is free, open-source software. The available wiki includes comprehensive information for getting started with the tool as well as best practices for using it.

theHarvester
One of the simplest tools to use on this list, theHarvester is designed to capture public information that exists outside of an organization’s owned network. It can find incidental things on internal networks as well, but the majority of tools that it uses are outward facing. It would be effective as a reconnaissance step prior to penetration testing or similar exercises.

The sources that theHarvester uses include popular search engines like Bing and Google, as well as lesser known ones like dogpile, DNSdumpster and the Exalead meta data engine. It also uses Netcraft Data Mining and the AlienVault Open Threat Exchange. It can even tap the Shodan search engine to discover open ports on discovered hosts. In general, theHarvester tool gathers emails, names, subdomains, IPs and URLs.

TheHarvester can access most public sources without any special preparations. However, a few of the sources used require an API key. You must also have Python 3.6 or better in your environment.

Anyone can obtain theHarvester on GitHub. It’s recommended that you use a virtualenv to create an isolated Python environment when cloning it from there.

Shodan
Shodan is a dedicated search engine used to find intelligence about devices like the billions that make up the internet of things (IoT) that are not often searchable, but happen to be everywhere these days. It can also be used to find things like open ports and vulnerabilities on targeted systems. Some other OSINT tools like theHarvester use it as a data source, though deep interaction with Shodan requires a paid account.

The number of places that Shodan can monitor and search as part of an OSINT effort is impressive. It’s one of the few engines capable of examining operational technology (OT) such as the kind used in industrial control systems at places like power plants and manufacturing facilities. Any OSINT gathering effort in industries that deploy both information technology and OT would miss a huge chunk of that infrastructure without a tool like Shodan.

In addition to IoT devices like cameras, building sensors and security devices, Shodan can also be turned to look at things like databases to see if any information is publicly accessible through paths other than the main interface. It can even work with videogames, discovering things like Minecraft or Counter-Strike: Global Offensive servers hiding on corporate networks where they should not be, and what vulnerabilities they generate.

Anyone can purchase a Freelancer license and use Shodan to scan up to 5,120 IP addresses per month, with a return of up to a million results. That costs $59 per month. Serious users can buy a Corporate license, which provides unlimited results and scanning of up to 300,000 IPs monthly. The Corporate version, which costs $899 per month, includes a vulnerability search filter and premium support.

Metagoofil
Another freely available tool on GitHub, Metagoofil is optimized to extract metadata from public documents. Metagoofil can investigate almost any kind of document that it can reach through public channels including .pfd, .doc, .ppt, .xls and many others.

The amount of interesting data that Metagoofil can gather is impressive. Searches return things like the usernames associated with discovered documents, as well as real names if available. It also maps the paths of how to get to those documents, which in turn would provide things like server names, shared resources and directory tree information about the host organization.

Everything that Metagoofil finds would be very useful for a hacker, who could use it to do things like launch brute-force password attacks or even phishing emails. Organizations that want to protect themselves could instead take the same OSINT gathered information and protect or hide it before a malicious actor can take the initiative.

searchcode
For those who need to go really deep into the complex matrix of OSINT gathering, searchcode is a highly specialized search engine that looks for useful intelligence inside source code. This powerful engine is surprisingly the work of a single developer.

Because a repository of code needs to be first added to the program before becoming searchable, searchcode straddles the line between an OSINT tool and one designed to find things other than public information. However, it can still be considered an OSINT tool because developers can use it to discover problems associated with having sensitive information accessible inside code on either running apps or those that are still in development. In the latter case, those problems could be fixed prior to deployment into a production environment.

Although anything involving code is going to require more knowledge than, say, a Google search, searchcode does a great job of making its interface as easy to use as possible. Users simply type in their search fields and searchcode returns relevant results with search terms highlighted in the lines of code. Suggested searches include usernames, security flaws like eval $_GET calls, unwanted active functions like re.compile and special characters that can be used to launch code injection attacks.

1 2 Page 1
Page 1 of 2
The 10 most powerful cybersecurity companies