What is the usage of theHarvester?
theHarvester is a command-line tool included in Kali Linux that acts as a wrapper for a variety of search engines and is used to find email accounts, subdomain names, virtual hosts, open ports / banners, and employee names related to a domain from different public sources (such as search engines and PGP key servers).
Email harvesting is the process of extracting email addresses from public sources. Harvesters capture email addresses in many ways including: Buying or trading lists; Using bots to scrape web pages for addresses
I modified kali linux email harvester in ruby to search for a large list of domains or keywords and save them in csv files. To avoid being banned from search engines, the code used a basic method to wait for the VPN runs a new IP address.
The purpose of the project was to collect emails for companies working in specfic fields ex. automation.
The VPN used was from PIA which uses openvpn. The pia-scheduler shell script is responsible for conecting to the VPN and IP switching.
The code can be viewied from the github repo: https://github.com/elmalla/search_engine_email_harvester