Hasty Scripts: Go Phish

With 2017 in full swing, the ever-maligned and down on his luck Nigerian Prince is as hard at work as ever. While tried and true, phishing scams have continued to evolve with the times even if the underlying method to carry out the scam is the same. What’s one to do other than review emails for bad punctuation, suspicious attachments, and ludicrous scenarios?

For larger mailboxes and without a timeframe to go on, this can become a time-intensive and tedious task. It would be better if we could throw some automation at this to highlight standout emails to get the ball rolling. Fortunately, libpff, one of the many libyal libraries, and its Python bindings, pypff, can be used to manipulate PSTs or OSTs.

In this post, we’ll develop a small script showing off some of the basic functionalities of pypff. Specifically, we will compare email headers to identify emails where the “From” and “Reply-To” or “Return-Path” addresses do not match. Emails, where the “From” is different from the “Reply-To” or the “Return-Path” headers, could be indicative of spoofing and deserve a second look. We will write metadata of emails falling under this category to a CSV to quickly highlight potentially troublesome messages.

Currently, the Python bindings for libpff are in development and lack some features, such as attachment support. That said, we can easily review other aspects of an email in a PST using supported methods. Because this is an actively developing library, we are using the latest stable release, libpff-experimental-20161119, to develop our code as methods we rely on may change from release to release.

Installing pypff

Please see the instructions below for installing pypff with Python 2.X on a fresh Ubuntu VM install. We will be following the building instructions for libpff which can be viewed on the Github wiki. In a clean install of Ubuntu, open a terminal and execute the following:

  1. sudo apt-get -y update && sudo apt-get -y upgrade (this may take awhile)
  2. sudo apt-get -y install build-essential debhelper fakeroot autotools-dev zlib1g-dev python-dev python-pip
  3. sudo pip install unicodecsv
  4. Download libpff-experimental-20161119 and extract its contents
  5. Navigate to the extracted directory and execute sudo python setup.py install

This should successfully install pypff for Python 2.X. If you would rather install pypff for Python 3.X, first install python3-dev and then execute the last statement using python3 rather than python.

PST Go Phish

This script can be accessed from the blog Github account. Let’s begin by talking about the inputs this script accepts. Our first two arguments, the PST_FILE and OUTPUT_DIR, are self-explanatory. The optional ignore argument can be used to filter emails being compared. While developing, I noticed a great number of “suspicious” emails in the output due to bounce lists. This argument will allow you to remove these and other known good emails. However, I recommend running the script without ignoring anything and reviewing it in a CSV instead.

ubuntu@ubuntu:~/Desktop$ python pst_go_phish.py -h
usage: pst_go_phish.py [-h] [-i IGNORE] PST_FILE OUTPUT_DIR

PST Go Phishing..

positional arguments:
  PST_FILE              File path to input PST file
  OUTPUT_DIR            Output Dir for CSV

optional arguments:
  -h, --help            show this help message and exit
  -i IGNORE, --ignore IGNORE
                        Comma-delimited acceptable emails to ignore e.g.
                        (bounce lists, etc.)

Let’s take a look at some of the more important functions and lines of code. Using pypff, we open the PST or OST file and use the get_root_folder() function to get the root element of the storage table. We use list comprehension to quickly split out all of the ignore arguments passed into the function and store it in the ignore list. After these actions, we call the recursePST() function to iterate through the file and find potentially suspicious messages. Once we finish recursing through the PST we call the csvWriter() function to write our output.

def main(pst_file, output_dir, ig):
	print "[+] Accessing {} PST file..".format(pst_file)
	pst = pypff.open(pst_file)
	root = pst.get_root_folder()
	print "[+] Traversing PST folder structure.."
	if ig is not None:
		ignore = [x.strip().lower() for x in ig.split(',')]
		ignore = []
	recursePST(root, ignore)

The recursePST() function is a simple recursive function that iterates through every folder within the PST and all of its sub-folders. For each folder, we call the processMessages() function. This function will search for emails we can compare (i.e., those with reply-to and/or return-path headers) and send them further along.

def recursePST(base, ignore):
	for folder in base.sub_folders:
		if folder.number_of_sub_folders:
			recursePST(folder, ignore)
		processMessages(folder, ignore)

We iterate through the messages in the input folder and, with the help of the get_transport_headers() function, inspect the email headers for each of them. This function returns a string with new line characters to split tags. For this reason, we use the splitlines() method to quickly create a list of strings representing the different headers.
We cycle through that list and identify strings correlating to the from, reply-to, and return-path headers. If we are missing the from header or if there is not a reply-to and a return-path header we skip the message. If we have the from header and at least one of the others we call the compareMessage() function to actually identify potentially suspicious messages.

def processMessages(folder, ignore):
	global messages
	print "[+] Processing Folder: {}".format(folder.name)
	for message in folder.sub_messages:
		eml_from, replyto, returnpath = ("", "", "")
		messages += 1
			headers = message.get_transport_headers().splitlines()
		except AttributeError:
			# No email header
		for header in headers:
			if header.strip().lower().startswith("from:"):
				eml_from = header.strip().lower()
			elif header.strip().lower().startswith("reply-to:"):
				replyto = header.strip().lower()
			elif header.strip().lower().startswith("return-path:"):
				returnpath = header.strip().lower()
		if eml_from == "" or (replyto == "" and returnpath == ""):
			# No FROM value or no Reply-To / Return-Path value

		compareMessage(folder, message, eml_from, replyto, returnpath, ignore)

Before we start the comparison, we need to extract the email address from each of the present headers. Normally, you may see a header like “From: John Doe ” – the emailExtractor() function extracts the email address and domain from the header. We perform this operation for all three headers, if present, and set booleans to True to know if the reply-to and / or the return-path headers are to be compared based on the inputs. After this, we are ready to make our comparisons.

def compareMessage(folder, msg, eml_from, reply, return_path, ignore):
	from_email, from_domain = emailExtractor(eml_from)
	if reply != "":
		reply_bool = True
		reply_email, reply_domain = emailExtractor(reply)
	if return_path != "":
		return_bool = True
		return_email, return_domain = emailExtractor(return_path)

We compare the return-path and reply-to headers in the exact same manner. Let’s look at one as an example. For the return-path header, we check to verify it was present, then that the domain was able to be extracted, and lastly that it is not one of the emails we should be ignoring. If it passes those tests, we determine if the message warrants a further look if the email domain is not the same (i.e., somemail.com). If desirable, we could use the from_email and return_email to compare and test for an exact match. However, I elected to allow messages where the sender name differed but the domain was the same.

	if return_bool is True:
		if from_domain != False and return_domain != False:
			for igno in ignore:
				if igno in return_email:
					ignored_messages += 1
			if from_domain != return_domain:
				suspicious = True
				found_suspicious = "Return-Path"

If an email is found to be “suspicious”, we increment the suspicious message counter by one (which is printed later along with many other counts) and append the desired fields to the global message_list.

            if suspicious is True:
                suspicious_messages += 1
                message_list.append([folder.name, msg.get_subject(), msg.get_sender_name(), msg.number_of_attachments, from_email, return_email, reply_email, found_suspicious])

Lastly, the csvWriter() function, in a few lines of code, outputs the global message_list we have been adding “suspicious” messages to and writes it out to a CSV file. With this CSV file, one can quickly starting applying filters and all the usual Excel magic to further eliminate false positives. One thing that will likely be useful is to filter on the “Attachments” column and only show messages with one or more.

def csvWriter(output_dir):
	global message_list
	headers = ["Folder", "Subject", "Sender", "Attachments", "From Email", "Return-Path", "Reply-To", "Flag"]
	with open(os.path.join(output_dir, "go_phish.csv"), "wb") as csvfile:
		csv_writer = csv.writer(csvfile)

Those are the highlights for this script. The idea with this script is to show off some of the functionality of the pypff library and provide, hopefully, a useful triage script. As I developed the code, I thought of a simpler, but likely just as valuable, script idea using this library. If you would like to practice your Python development chops write a script that iterates through a PST or OST, identifies all messages from every unique email address / domain, and then highlight only those email addresses / domains that have one message associated with them and the attachment variable is greater than one.

The idea is that phishing emails are typically one-offs and if there are additional attempts they are normally with different email addresses. Ensuring there is one or more attachment weeds out likely benign messages (unless they use a link rather than an attachment).

That’s it for this post and script! If you’d like to give it a go, feel free to access it on Github. Do you have other quick methods you use when searching for the proverbial needle in the haystack? Let me know below.

Hasty Scripts: Go Phish

2 thoughts on “Hasty Scripts: Go Phish

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s