Well, while we are on the topic of data cleaning and analytics, here is another interesting article. In fact, this article also falls under the category of “What did I Learn Today?”.
Like many of you, I am also a part of many WhatsApp groups. I wanted to check how the chat transcript can be analyzed. Here is the process I followed using Power Query and Excel.
Get the WhatsApp Transcript by mail
This is easy. Just mail to yourself and then use the attached text file as the input for further analysis. DO NOT export it with media. The file size can get very large if it is a busy group.
Why not use Text Import Wizard of Excel
I am going to use Power Query instead of the regular text import wizard. This is because, Power View Gives me many more options which the import wizard does not provider. Read on and you will find out the difference.
You can download and install Power Query from here.
Typical WhatsApp transcript looks like this.
Data Clean Up
There are lots of steps involved. So I am going to keep the description brief – explaining only things which are not obvious. Ten steps required to get clean data for further analysis.
Open Power Query Tab and use From File – Text file. The data is imported.
This just splits the Time column. Using the delimiter only ONCE is a very good option which is not available with traditional Text Import Wizard of Excel.
Similarly, split the date portion using – as the delimiter.
Now split the name of the person. It is always followed by a :
This is the data with columns renamed.
Remove unwanted dates
The data gets mixed up if there are CRLF (Enter key) within the messages. To clear those rows, I removed anything with Null in the Date column. You will need to look at your data and you may need to use a different approach.
Finally these are the steps which were required. Next time you get another transcript, you can keep the file name same and overwrite the old file with the new file and refresh this query. It is now automated!
Now the analytics
This is very easy using a Pivot Table and Pivot Chart. Here are some examples.
Dynamic Text Analysis
Suppose you want to search for the instances of a particular word, you will need a calculated column. The search word is kept in a separate cell outside the data and Find is used to check for the occurrence.
This column should be used in value area for counting various the occurrence.
This is a workable solution, but is cumbersome because you have to refresh the pivot table every time you change the search condition. Power View provides a better solution for this.
Using Power View
Here I have a simple Power View dashboard of the data with three charts. Who posted when by date, Posts by person Pie chart and Posts by time. All these charts are based upon all the posts.
Here I am assuming that you already know how to use Power View. I have covered it briefly in this post – Working with Excel data? There is a revolution happening. Learn Power BI.
I will cover more about Power View in upcoming posts.
Now we want to do textual analysis. To do so, you add the Message to Filter Area and choose the other option Advanced Filter Mode
Now you can type the text you are looking for and press Enter. This filter is immediately applied to all the charts. No refresh is required.
This is the same dashboard showing only those posts where the word Test was contained.
Practical Usage Scenarios
Now that we know the technique behind it, let us think where we can use this in practice. Of course there are two type of groups – informal and formal.
Informal groups are formed between friends, co-workers, alumni, people who have come together due to some common interest / cause. They can perform such analysis to find out participation, contribution, etc. as shown above.
For formal groups, analytics can be even more useful.
- For example, it can be used for quick auditing to find a particular context.
- Another useful thing is to get all the messages by a particular person in a specific time period.
- Often URLs or references are posted. Searching by “http” can be very useful here.
- Using Power Query simplified the text processing. It was eminently better than using Text Import Wizard.
- The effort put in for cleaning up the data is only a One Time Effort. Next time, all the work will be done by Power Query.
- Now we can focus on the analysis rather than wasting time on the clean up.
Key Learning: Next time you are using a traditional clean-up method, think if there is a better alternative available.