Often we get data which is distributed in multiple column sets. Here is how you can convert it to good tabular data using Power Query. Watch the video. Sample File download.
This is the worst type of input data to get. Each row has been split into two (or sometimes more) rows. We want those to be combined into a single row. In an earlier article, I have shown one method of doing this.
Now here is a simpler, faster and more powerful method using Power Query.
(Estimated reading time 12 min)
Thank you for the overwhelming response. More than 800 attendees, 70+ questions.
Watch the video on YouTube (use 720P) or Download (MP4, 188 mb, 60 min) and view on your PC. A big thank you to the Microsoft and Economic Times team for making this possible.
Two simple approaches. One is to add an extra column with serial numbers and then sort on that column in descending order. This works for small amount of data. For large data it is best to import it in Power Query and choose Transform tab – Reverse Rows.
Benefit of Power Query? It works on a smaller sample of data and then applies the transformation when you choose Save and Load option. This is much faster than getting all the data and then trying to sort it (which is the first method).
Why is this required? Usually required with logs where the first transactions or rows are at the bottom. So the data is received in reverse chronological order. Twitter feeds, Timeline Updates, Live blogs – all follow this pattern.
This method works independent of the time-stamp column. What is wrong with timestamp? It may be in different time formats, some rows may have same timestamp and some rows may have no timestamp at all.
Just one button to press. Data – Remove Duplicates.
Data should be a table (or like a table – column headings with data below). If it is a table choose Table Tools – Remove Duplicates.
By default, duplication is checked for ENTIRE ROW.
Remove checkboxes to check for duplicates for specific fields only.
Be careful. Duplicate rows are DELETED.
This is unlike Advanced Filter – Unique Rows only – where duplicates are HIDDEN.
For large data – Power Query – Remove Duplicates is infinitely more powerful and faster. We will cover it in a separate article.
Single row of data split into two rows! Trying to clean-up such data is a frustrating experience. Here is how you do it – faster and smarter! 3 min video
Here is a quick overview of how we can analyze data more effectively using new Excel tools. You can also refer to the collection of 51 articles about data analytics for a more detailed coverage. Data Analytics: Knowledge Pack.