WhatsApp Group Chat Analytics using Excel and Power Query

Well, while we are on the topic of data cleaning and analytics, here is another interesting article. In fact, this article also falls under the category of “What did I Learn Today?”.

Like many of you, I am also a part of many WhatsApp groups. I wanted to check how the chat transcript can be analyzed. Here is the process I followed using Power Query and Excel.

image

Get the WhatsApp Transcript by mail

This is easy. Just mail to yourself and then use the attached text file as the input for further analysis. DO NOT export it with media. The file size can get very large if it is a busy group.

Why not use Text Import Wizard of Excel

I am going to use Power Query instead of the regular text import wizard. This is because, Power View Gives me many more options which the import wizard does not provider. Read on and you will find out the difference.

You can download and install Power Query from here.

Typical WhatsApp transcript looks like this.

SNAGHTML248d4c

Data Clean Up

There are lots of steps involved. So I am going to keep the description brief – explaining only things which are not obvious. Ten steps required to get clean data for further analysis.

Open Power Query Tab and use From File – Text file. The data is imported.

SNAGHTMLafa92b6

This just splits the Time column. Using the delimiter only ONCE is a very good option which is not available with traditional Text Import Wizard of Excel.

image

SNAGHTMLafc96d3

Similarly, split the date portion using – as the delimiter.

image

SNAGHTMLafe5de7

Now split the name of the person. It is always followed by a :

SNAGHTMLb020db1

image

This is the data with columns renamed.

SNAGHTMLb064596

Remove unwanted dates

The data gets mixed up if there are CRLF (Enter key) within the messages. To clear those rows, I removed anything with Null in the Date column. You will need to look at your data and you may need to use a different approach.

SNAGHTMLb0d9d59

image

Finally these are the steps which were required. Next time you get another transcript, you can keep the file name same and overwrite the old file with the new file and refresh this query. It is now automated!

image

Now the analytics

This is very easy using a Pivot Table and Pivot Chart. Here are some examples.

image

image

image

image

Dynamic Text Analysis

Suppose you want to search for the instances of a particular word, you will need a calculated column. The search word is kept in a separate cell outside the data and Find is used to check for the occurrence.

image

This column should be used in value area for counting various the occurrence.

image

This is a workable solution, but is cumbersome because you have to refresh the pivot table every time you change the search condition. Power View provides a better solution for this.

Using Power View

Here I have a simple Power View dashboard of the data with three charts. Who posted when by date, Posts by person Pie chart and Posts by time. All these charts are based upon all the posts.

Here I am assuming that you already know how to use Power View. I have covered it briefly in this post – Working with Excel data? There is a revolution happening. Learn Power BI.

I will cover more about Power View in upcoming posts.

image

Now we want to do textual analysis. To do so, you add the Message to Filter Area and choose the other option Advanced Filter Mode

image

Now you can type the text you are looking for and press Enter. This filter is immediately applied to all the charts. No refresh is required.

image

This is the same dashboard showing only those posts where the word Test was contained.

image

Practical Usage Scenarios

Now that we know the technique behind it, let us think where we can use this in practice. Of course there are two type of groups – informal and formal.

Informal groups are formed between friends, co-workers, alumni, people who have come together due to some common interest / cause. They can perform such analysis to find out participation, contribution, etc. as shown above.

For formal groups, analytics can be even more useful.

  • For example, it can be used for quick auditing to find a particular context.
  • Another useful thing is to get all the messages by a particular person in a specific time period.
  • Often URLs or references are posted. Searching by “http” can be very useful here.

Summary

  • Using Power Query simplified the text processing. It was eminently better than using Text Import Wizard.
  • The effort put in for cleaning up the data is only a One Time Effort. Next time, all the work will be done by Power Query.
  • Now we can focus on the analysis rather than wasting time on the clean up.

Key Learning: Next time you are using a traditional clean-up method, think if there is a better alternative available.

Clock

7 thoughts on “WhatsApp Group Chat Analytics using Excel and Power Query

  1. Interesting BUT I could not do it for my chat conversation.

    My chat format is different.
    9/12/13, 6:47:47 AM: XYZ: Text
    9/12/13, 12:41:11 AM: ABC: Text1

    I could split that however if you can suggest better way. but when somebody sends some joke,quote then it spills to multiple lines

    9/12/13, 2:04:56 AM: XYZ: A young unmarried girl Joke.
    Scared??..She Joke.

    How to handle that.

  2. Pingback: 200+ posts – 200,000+ users .. aiming for more zeros! | Efficiency 365

  3. Pingback: Knowledge Pack: Data Analytics | Efficiency 365

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s