If you don't know, Power Query is a data connection technology that allows you to discover, connect, combine, and refine data across various sources. Power Query offers a robust and high-performing data management experience when used well. I repeat when used well.
If you're new to Power BI, the loose ideal that I follow is:
If it can be done in Power Query, it probably should be done in Power Query.
But that is oversimplifying, and to that end, this blog post will guide you through some tips and best practices to optimise your Power Query operations.
Power Query offers a growing selection of data connectors ranging from TXT, CSV, and Excel files, databases such as Microsoft SQL Server, and popular SaaS services such as Microsoft Dynamics 365 and Salesforce.
If your data source isn't directly listed, you can use the ODBC or OLEDB connectors.
It feels weird to say this, but choosing the right connector is essential for maximising performance.
Ensure you get this step right.
Don't forget that each data connector adheres to an everyday experience with a stage called Data Preview.
Here, you can conveniently select your data and preview it before extraction.
Implement filters as early as possible in your query to avoid unnecessary computation.
Power Query takes advantage of your filters through query folding, resulting in increased performance.
Also, filter out irrelevant data to focus better on the task.
Power Query offers various filtering options. You can use the auto filter menu, which displays a unique list of values in your column, or type-specific filters for date, datetime, or even date timezone columns.
Certain operations require reading the whole data source to yield results, making them slow to preview. If possible, perform such "expensive" functions last.
This will help reduce the amount of time you spend waiting for the preview to render each time you add a new step to your query.
Some examples of expensive operations are:
If adding new steps to your query is slow, consider first performing a "Keep First Rows" operation and limiting the number of rows you're working against.
This way, you can add all the necessary steps quickly and then remove the "Keep First Rows" step later.
Useful when working on slower machines or through a proxy or remote desktop, too.
Having correct data types for your columns is crucial.
Features in Power Query are contextual to the data type of the column selected.
For example, specific options and transformations will be unavailable if the column type isn't set. It's also essential when working with type-specific filters clear to particular data types.
Some more performance, data integrity, and functionality issues are at play.
Bottom line - select the correct data type.
Remember to document your queries by renaming or adding descriptions to your steps, questions, or groups. This helps to keep your work organised and understandable, especially after you move on and some other developer picks up where you left off.
Next week we will go through a handy documentation hack that will speed up this process tenfold.
If a query contains many steps, consider splitting it into multiple questions where one query references the next.
This makes your transformation phases easier to understand by breaking them into smaller, digestible parts.
It can get a little weird, so ensure comments/documentation inside and outside of Power BI.
Groups in Power Query are like folders for your queries. They help keep your work organised. You can even create groups within groups if necessary.
Again, just getting ourselves organised.
Parameters can make your queries more dynamic and flexible. They allow you to store and manage a value that easily.
Things like starting your calendar from the first date of a data set.
Parameters can make life easier.
Anyway, Power Query is a gem. There are a lot of resources out there, and it's wise to refine your skills regularly.