Starting today, we will gradually introduce some terms in order to prepare for our ultimate goal.
Previously, we imported the data in the file/folder through Power Query, and did preliminary sorting and filtering, but in this process, we did not manipulate the original file/folder, or even opened it; we Just found the original data content, displayed the data we wanted on the screen, and loaded it into a new Excel file. During this process, the data file itself has not been changed.
Formally, we call this process of finding and selecting the required data through a certain method (such as Power Query) as "Query". In the future, we will no longer use terms such as "importing data" or "data importing process", and formally call it "Query".
query settings
In the initial example, we created a query for the data in the Sample1 file and saved it. Let's reopen this query. This section will focus on the panel on the right side of the Power Query editor, namely "Query Settings". The first part of
query setting is "query attribute", here there is only one important attribute, namely the name of this query. By default, the Power Query editor will use the worksheet name, file name or folder name to name the query, but changing it to a meaningful name is an excellent programming practice, oh no, please ignore the sentence just now "Programming" in the words. Z1z
renaming the query will allow us to better identify it, and provide convenience for future or sharing with others. We can name it "single store inventory query"-directly enter the new name in the name box and press Enter to confirm.
"Steps of application" in query settings is the focus of today. It saves our entire process from finding data source files to selecting data and organizing data, which is the query step of Power Query. Every time we run a query, we actually operate on the data in the order of the steps. Today’s work is to understand how Power Query turns the original data into what we want step by step by interpreting these steps. let's start.
Find the data file: The first step of querying the source
is generally "source", that is, to find the data file, click on the step "source" (mark ①), we can see some familiar figures in the Power Query editor : First, there are two columns (mark ②) in the data area, which represent the worksheet name and worksheet data respectively (don’t forget that the data itself is an "entity", which can also appear in the cell as a data) ; Click on the cell representing the worksheet data, you can see the actual content of the data in the preview area (mark ③); in the edit bar, we see the familiar Excel.Workbook () function (mark ④), its role It is to open the content of the specified file, but the "specified file" here is not installed in the "Content" column, but uses another function to connect to the path of the file to open.
Please note that there is a small gear on the right side of the "source" step, which indicates that we can modify the parameters of this step. Click the small gear, and a pane similar to "modify data source" pops up. Its function is the same as that of modifying the data source under the data tab of Excel as we mentioned before: modifying this parameter will modify the data of this query File address.
In fact, modifying parameters through this pane is essentially modifying the address part of the Excel.Workbook() function in the edit bar. In some cases, the small gear on the right side of the "source" step will disappear (this may be a bug in Power Query), and we cannot modify the parameters of this step through the pop-up pane, nor can we modify it under the data tab of Excel Data source address, but we can directly modify the parameters of the formula in the edit bar.
Find the specified worksheet: Navigation
After we find the file through the first step, the second step of the query is of course to find a worksheet in the file, which is called "navigation" here. TheThe steps find the worksheet we want, and expand the data of the worksheet. Through the second step of running the query, we have found the data we need, but what we see is the original appearance of the data: there are a lot of empty rows, the columns are not sorted according to our expectations, some columns are not typed, Even the first column has not been recognized as a title.
Please note that there is also a small gear on the right side of the "Navigation" step. It is easy to think that its function is to set which worksheet we are looking for. The actual effect of modifying this setting is actually to modify the corresponding worksheet name in the function in the formula bar. parameter.
Sort the original data: improve the title, change the type, etc.
After running the first two steps, we found the original data, but we need to further sort the original data.
first promotes the first row to the column name, or title, this step Power Query has automatically done it for us, called "enhanced title"... At first glance, this is poor quality machine translation, in order to maintain our excellent naming It’s customary to rename this step to a meaningful name, such as "promote the first row to the title"
. Secondly, Power Query has helped us to automatically identify each column, but as we mentioned earlier, don’t Believe in automatic type recognition, check everything again and modify as needed. Please note that when we change the type, Power Query may remind us "whether to insert a step", please select "Yes" first; "replace a step or add a new step", please select "replace step" first.
Please note that here we have modified the automatic generation steps.
In addition, we also filtered the rows and reordered the columns. These operations are saved by Power Query as query steps. We can click on these steps one by one to check whether these steps are what we did at the time.
Finally, let's change these steps to meaningful and easy-to-understand names.
Reordering the query steps
Although Power Query can automatically generate steps, these steps can be changed. For example, we added two steps (deleting blank lines, rearranging), and modified the automatically generated steps (manually changed based on the automatic recognition type). But in fact, we can also perform some other operations.
For example, the last three steps in this query "change data type", "filter valid rows" and "reorder columns" are actually independent of each other. Changing their order will not affect the query results.
However, if we move the order of "promote the first row to heading", there is a high possibility of problems.
At this time, because "change data type", "filter valid rows" and "reorder columns" are essentially changing the data type, filtering situation and order of each column through the column header. For example, to filter valid rows is actually to filter out the rows where the "product name" column is not null and "" and the "inventory quantity" is not equal to 0:
This is a philosophical question: the prerequisite for operating the column headings is existence The corresponding column headings ("Product Name", "Stock Quantity"). In fact, the column headings of this query are generated by the step of "promote the first row to headings", so the step of generating column headings (to be precise, named column headings) must be placed in all the steps that need to be performed on the column headings Before the operation steps, the column headings can be found and operated. In a more general way, if some steps have a dependency on their previous steps (some intermediate steps), then changing the dependent steps is very likely to cause problems with the subsequent dependent steps, and then make the entire query problematic.
delete query step
believe that readers have noticed, each step has a small X before it, which means that we can delete the step. If a certain step is deleted, we will lose some work. Similar to the modification steps, deleting the dependent steps can also cause query errors.
Add and insert steps
Before, we added two steps of "filtering valid rows" and "reordering columns" by operating Power Query. In fact, you did not notice that we all operated when we selected the last step of the query at that time. This operation A new step is added to the end of the query for us; however, if we select the middle step of the query to operate, the result will be "insert" the step after the selected step instead of adding a new step at the end. If the insert step changes the part that is dependent on the subsequent steps, it may also cause subsequent query errors. We have received a prompt when we selected "Change data type" to operate before.
A suggestion for the query steps
Since in so many cases, there may be problems in the query due to dependencies, I strongly recommend that unless the reader can understand the principle of each step, and can clarify the steps Dependency relationship, you should try to avoid inserting, modifying or deleting steps in the middle of a query. On the contrary, we should achieve our need to further organize the data by adding steps at the end of the query.
If an operation is not completely processed in the previous step, we can also add a new step to continue processing it instead of using the previously used "replacement step" operation. Power Query does not require all the operations of the same function to be placed in the same step, nor does it require that these steps must be arranged consecutively.
Similarly, if we find that there are some errors in arranging the data, we can delete the next steps of the query error until we need to keep the steps, and then rearrange. Of course, we can also use "Delete to the end" to delete a step and all subsequent steps