Original Author: Ogheneyoma Okobiah.

Source: yomaokobiah.com


There is a lot of data out there, mostly unstructured. Emails are a great source of communication data as such there is no limit to what we can harness from it. At the end of this tutorial you would be able to get email data for insights.

Prerequisites

  • Familiarity with Python 3

  • Pandas

  • Matplotlib

  • Seaborn

  • Wordcloud

  • A gmail account

Getting The Data

There are several ways to achieve the aim of this article. Find below, how I did mine.

Here a Gmail account is being used; for the imaplib script to work the following changes have to be made to the Gmail account; enabling IMAP and turning on less secured apps.

  • To enable IMAP, first open Gmail, then click on the settings icon and click settings. Click on the Forwarding and POP/IMAP tab. In the "IMAP Access" section, select Enable IMAP. Then click save changes. If you need more help visit this Gmail help page.

  • To turn on less secured apps, navigate to your Google dashboard either by clicking on your account avatar in the upper right-hand corner of your screen and then clicking My Account or by navigating to myaccount.google.com. Then choose Sign-in & security, scroll down until you see the option Allow less secure apps, and turn the access on.

Step 1: Importing the required libraries to get the email data.

  • imaplib is an Internet Message Access Protocol (IMAP) library

  • email is a python library that parses, handles and generates email messages.

  • getpass is a python library that contains utilities to get password or current username

  • pandas is a python library for data manipulation and analysis.

import imaplib
import email
import getpass
import pandas as pd

Step 2: Gaining access to the email address.

  • username is the email address.

  • password is the password to the email address when prompted. [If you don't want to use the getpass package, you can enter your password as a string.]

  • mail is the email server we're going to connect to and it varies, for this tutorial we're using gmail.

  • mail.login is an attempt to log into the servernusing the provided credentials.

username =  input("Enter the email address: ")
password = getpass.getpass("Enter password: ")
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(username, password)

Step 3: Specifying the mailbox to get data from.

  • mail.list() is a method that gives a list of the mailboxes - i.e inbox, draft and so on in the email address.

  • mail.select() is a method that takes an argument of the mailbox you want to get data from"""

print(mail.list())
mail.select("inbox")

Step 4: Searching and Fetching the data.

  • Line 1: mail.uid() is a method whose first argument is the command you want to execute, in this case the command is "search". The rest of the arguments are used for the search. (Search gives from oldest to recent)

  • Line 1: result is an exit code of the command while numbers is a list that contains an object of type byte.

  • Line 2: is a list of every section in numbers.

  • Line 3: is a list of decoded bytes

  • Line 4: is a slice of the recent 100 items [recall that search orders it from oldest to recent].

  • Line 5: the command we want to execute is "fetch" and store it in messages. We're fetching the subject of the messages based on the uids.

result, numbers = mail.uid('search', None, "ALL")
uids = numbers[0].split()
uids = [id.decode("utf-8") for id in uids ]
uids = uids[-1:-101:-1]
result, messages = mail.uid('fetch', ','.join(uids), '(BODY[HEADER.FIELDS (SUBJECT FROM DATE)])')

Step 5: Preparing the data to be exported.

  • Line 1-3: empty lists for the data we specified in messages.

  • Line 4: looping through the content of the message we fetched. Using a step of two because it returned a tuple of two items.

  • Line 5: parsing the bytes email to message object.

  • Line 6-11: msg is in bytes, in order to use it it had to be decoded to a format we can read.

  • Line 12: adding the dates to date_list.

  • Line 13-15: getting the sender detail, it's in the format "Sender name" <sender email address> hence the split and replace methods are used to get only the "Sender name".

  • Line 16-19: converting the objects in date_list to datetime objects, because the time has it's UTC format attached, a new list was created and the UTC format was sliced off from each object in the list.

  • Line 20-22: checking the length of created lists, because arrays have to be the same length.

  • Line 23-25: converting the lists to a dictionary and then a pandas dataframe, viewing it and saving it for download.

date_list = []
from_list = [] 
subject_text = []
for i, message in messages[::2]:
    msg = email.message_from_bytes(message)
    decode = email.header.decode_header(msg['Subject'])[0]
    if isinstance(decode[0],bytes):
        decoded = decode[0].decode()
        subject_text.append(decoded)
    else:
        subject_text.append(decode[0])
    date_list.append(msg.get('date'))
    fromlist = msg.get('From')
    fromlist = fromlist.split("<")[0].replace('"', '')
    from_list1.append(fromlist)
date_list = pd.to_datetime(date_list)
date_list1 = []
for item in date_list:
    date_list1.append(item.isoformat(' ')[:-6])
print(len(subject_text))
print(len(from_list))
print(len(date_list1))
df = pd.DataFrame(data={'Date':date_list1, 'Sender':from_list, 'Subject':subject_text})
print(df.head())
df.to_csv('inbox_email.csv',index=False)

Visualisation

Now that we have a the email data in CSV format, we can read the data using pandas, and visualise it. There are several Python data visualisation libraries, but here I used Wordcloud, Matplotlib and Seaborn. I wanted to see an image of the most used words in the subjects of my emails; here is how I did it.

Step 1: Reading and viewing the csv.




Step 2: Getting statistical data.

I used the the describe method to get the statistical data, unique values and all to get insight on what's in the data.




Step 3: Creating new variables.

I created two variables; Time and SinceMid. SinceMid is the number of hours after midnight.

(Note: The time can be removed from the date column completely)
from datetime import datetime
FMT = '%H:%M:%S'
emails['Time'] = emails['Date'].apply(lambda x: datetime.strptime(x, '%Y-%m-%-d%H:%M:%S').strftime(FMT))
emails['SinceMid'] = emails['Time'].apply(lambda x: (datetime.strptime(x, FMT) - datetime.strptime("00:00:00", FMT)).seconds) / 60 / 60



Step 4: The plots.

I created a wordcloud image of the most used words in the subjects of my mails. In this example there are no stopwords, stopwords are usually filtered out as most times they're not informative.

from wordcloud import WordCloud
import matplotlib.pyplot as plt


# Create a list of words
text = ""
for item in emails["Subject"]:
    if isinstance(item,str):
        text += " " + item
    text.replace("'", "")
    text.replace(",","")
    text.replace('"','')


# Create the wordcloud object
wordcloud = WordCloud(width=800, height=800, background_color="white")

# Display the generated image:
wordcloud.generate(text)
plt.figure(figsize=(8,8))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.margins(x=0, y=0)
plt.title("Most Used Subject Words", fontsize=20,ha="center", pad=20)
plt.show()

Here's the output:




I created a histogram of the hours after midnight using seaborn.

import seaborn as sns
sns.distplot(emails["SinceMid"],bins=20)
plt.title("Hours since midnight")

Here is the histogram:




You can check out python gallery for more possible visualisations.

Conclusion

I had fun writing this, I hope you did too while reading it. This goes without saying, I encountered ERRORS while doing this [some of them I had never seen before]. When you get error messages, a good starting point is using the print statement to get insight and then googling the error message. The Part II will also be published on this blog, it would focus on getting the body of the mail and not the subject as this one.

The full code can be found here.

Thank you for reading up to this point.

Disclaimer: I encourage you to experiment outside what's written here, if you encounter bugs you're on your own. But if you feel like getting me involved with your bugs [after Googling], send me a DM on Twitter @yomdroid [we'll pray about it and see what we can do]. Thank you in anticipation.

I've been always pretty piqued about being a freelancer (not there yet!). A big part of that was because I imagined the Work-From-Home (WFH) life to be super cool and fun and all that. I took these times as trying out the beta version before I took the full plunge and I've found one caveat: being consistently productive can be a little difficult. One day I'm on a charged up A-mode and the next, I find myself lounging all day in my blanket. If you can relate, then we can both admit that we need a little help! I went digging and came out with these five tips that could help boost our productivity and glean from that WFH life in its entirety.


1. Do the hardest thing first.

Our peak energy levels are usually in the mornings – after a good night’s sleep, and an equally good breakfast. Even as a certified night owl, I find this to be true (this might not be so for everyone though). Point is, it’s a great idea to do your least enjoyable task when you can give it max focus. The feel-good sense of accomplishment you'll get from completing it will keep you hyped up to finish the rest.


2. “Kabanize”.

“Kaban” is a word from Japanese which means “placard” or “sign”. Creating your own kaban is especially good if you’re a more visually-motivated person. Divide your tasks into 'To Do', 'Doing' and 'Done'. Write them on cards or sticky notes and glue them to a board – or you could just use the sticky notes app on your laptop. As and when you complete tasks, you re-check your board and move things around. Be sure to reward yourself with a cup of hot chocolate (or mint tea!) when you get all your tasks into the 'Done' section.


3. Rubber Duck Debugging

“Rubber duck debugging” comes from a story in the book The Pragmatic Programmer. The original idea is to debug your code (all the programmers say “Yay!”) with the help of a rubber duck. You talk to the duck as you go through the lines of your code. This helps to spot and resolve the problem, or even give clarity as to what exactly the problem is. You can adopt this method to other areas in life. More often than not, verbalising your issues helps you figure out how to deal with them. Hey, you might not even need an actual rubber duck!


4. Timeboxing.

I find this incredibly helpful. Basically, you split up your day into blocks of time – and schedule different tasks to them. So say, for my first two hours today, I’m focusing solely on writing all my articles. For the last hour, I'll be checking all the dog videos sent to me on Instagram.


5. “No Zero Days”.

This might be the simplest method – and one I really love. It’s telling yourself “ I’m going to finish one task today, by hook or crook!” and following through. So go ahead and complete that section of your thesis and pat yourself on the back for being such a badass. Repeat tomorrow.


Let me know which of these tips are your favourite, and which ones do not work well for you at lilectmensah@gmail.com or on IG: @pa.bby . Enjoy that WFH life!

Updated: Jun 5

I used to babysit four-year-old Jude. I remember when we played in the yard, he could abruptly switch from running around, bursting with boundless enthusiasm to sit on the ground, all of a sudden! He would then look at me pointedly in the eye and ask, “Now, What?!”

Inasmuch as I’d rather not mention the words “COVID-19”, “coronavirus”, “pandemic” in this piece, they have changed our world and our lifestyle as we know it, willy-nilly. Together with Jude, I ask, “Now, What?!”


Whether you’re working from home or not, everything is not all right with the world and we must admit that we all have been affected, even if it’s imperceptible. I'd like to suggest ways that we can be a help to ourselves and to one another in our daily lives, and especially in the workplace.


Try And Exercise Empathy As A Colleague

In this area where “self-love and authenticity” are the buzzwords, being your true self is encouraged and embraced. Bringing “the real you” to work can help you perform better and amp your job satisfaction. Remember though that not everyone is naturally bubbly and even the usual happy-go-lucky dude or dame may be down in unusual times like these. If you meet a moody colleague, don’t immediately take it personal. He or she may be battling with anxiety or some personal crisis and might not be comfortable opening up.


You can start with “How’re you?” More often than not, the reply is “I’m fine”. You can then follow up with “I know you said you were fine when I asked earlier, but really, how ARE you? I can sense that something is bothering you, and I just wanted to check in again.” Follow their lead on how much – or how little – they want to share. Don’t forcefully probe. Some people like to take time to analyze things internally before they share (or don’t), and that’s okay. You can end with “I value your privacy. Whenever you want to talk, I'm here. I won’t pry if you don’t. This could help enable a healthy and trusting environment.


Say “No” To Your Perfectionist Tendencies

Perfection is the camouflaged enemy of productivity. Do you rigidly cling to habits that no longer benefit you? Do you drag yourself throughout the week because you’re exhausted from the feeling to over-deliver all the time? Do you fail at making decisions promptly because you’re obsessed with not making the wrong one? Remember that your greatest asset is not time, but rather ENERGY. If you pile up too much on your plate, you wouldn’t be able to bring your A-game to every task. You might go through and finish the week, but produce half-baked results.

Be self-aware and hold yourself accountable. For instance, you can set a rule such as “I’m giving myself fifteen minutes to think about this, and at the end I will make a decision and get it done and over with”.

Go through your daily commitments and do the cost-benefit analysis. Prioritize the most valuable ones and give the first bolt of your energy to them. You may find that some daily rituals – such as your meal schedules, or your morning routine – surprisingly drain more of your energy than they restore.


View Your Virtual Meetings As An Experience You Can Glean From.

What's your mind-set as you get ready for that video meeting? If you keep telling yourself that it’s a drudgery, it definitely will be. Tell yourself “I am going to give my best possible. I'm going to make an impact and I’m going to benefit from this.” Dress up. Show up. Focus on the camera, not at your colleagues' faces and their backgrounds. In fact, don’t care about how you might be appearing on their screens. Speak up and own that meeting!

If you’re leading especially, begin by acknowledging everyone present and the effort they put in to show up, with all that is going on. In smaller meetings you can check in with each person, before hitting the agenda for the day. Starting with an icebreaker can be very helpful. Be sure to record and share the link to key meetings so that those who were not able to participate can retroactively access the materials. This would help your meetings be more inclusive for everyone in your team.

Hope you found these reminders helpful! We're in trying circumstances but we will tackle each day as it comes and emerge stronger at the end of all this. Heck, we might even have some fun!


If you have any questions or you want to talk some more, hit me up via email: lilectmensah@gmail.com or on Instagram: @pa.bby .

  • Facebook
  • Twitter
  • YouTube
  • Instagram