[Solved] Excel Power Query: get data from multiple unfixed files with multiple unfixed sheets


Sorry for the length of this response, but there are a lot of steps involved and I included quite a bit of screen clips as well. I believe this solution does what you are looking for.

I start with files in a folder:

enter image description here

Feb-19.xlsx contains two tabs:

1. DE:
2. CA:

Jan-19.xlsx contains three tabs:

1. US:
2. UK:
3. CH:

I open a new Excel file, then click Data > New Query > From File > From Folder and either type in, or use the Browse button to navigate to, the location of the folder that has the files. (When I navigate to my OneDrive folder, my path has SkyDrive.old in it, but it is my OneDrive folder that you saw in the first image above.) Then I click OK:

enter image description here

Then I click Transform Data:

enter image description here

This appears:

enter image description here

I click Home > Manage Parameters (the words, with the drop-down arrow) > New Parameter, and I set it up like this and click OK.

enter image description here

After clicking OK, this appears:

enter image description here

You can see that I entered the path for the folder that has the files. I can change this parameter value later if I want to use a different folder path.

To do that, I would click on enter image description here in the left pane. Clicking it would bring me to this same place, where I can edit the value.

Now, I click on the query that I had already started. It is currently the only other item in the left pane. Clicking it brings this back up on the screen:

enter image description here

I edit the text in the formula bar, replacing "C:\Users\MARC_000\SkyDrive.old\Test" with FolderPath. The result is the exact same table, but the formula bar has Folder.Files(FolderPath). Now, instead of using a hard-coded reference, the query is using the parameter value.

Then, just because I want to, I change the query’s name to “Main Query.” You can do that by clicking on the query in the left pane, then changing the name in the PROPERTIES at the top of the right pane.

Next, I select both the Content and Name columns, and then Home > Remove Columns (the words, with the drop-down arrow) > Remove Other Columns to get this:

enter image description here

Then I click the enter image description here button to combine the files in the Content column, which brings up this pop-up. Then I click on just the folder only, and then OK.

enter image description here

Now there are more query entries in the left pane:

enter image description here

I click on the new query, Transform Sample File from Test, and see this:

enter image description here

I select the Data and Item columns, and then Home > Remove Columns (the words, with the drop-down arrow) > Remove Other Columns to get this:

enter image description here

—SEE EDIT AT BOTTOM OF ANSWER, WHICH REPLACES THE FOLLOWING—

Then I click the enter image description here button to expand the tables in the Data column, which brings up this pop-up. Then I clear the checkbox beside “Use original column name as prefix” and click OK.

enter image description here

—RETURN FROM EDIT AT BOTTOM OF ANSWER TO CONTINUE—

Which yields this:

enter image description here

Then I filter out null values from the Column1 column. (Click the down arrow at the top of the column and deselect null.)

Then I click Add Column > Conditional Column, and set it up like this, and click OK:

enter image description here

Which yields this:

enter image description here

Then I select the new Custom column and click Transform > Fill > Down, to get this:

enter image description here

Then I filter out “Summary (Amount)” and “Summary (Units)” entries from the Column1 column. (Click the down arrow at the top of the column and deselect ‘Summary (Amount)’ and ‘Summary (Units)’.) Which yields this:

enter image description here

Now I go back to the Main Query. In other words, click on Main Query in the left pane. There will be a “problem.” All I need to do is delete the last APPLIED STEP in the right pane: Changed Type. Once I delete that, all is good and I see this:

enter image description here

But I also want file names, so I click on the APPLIED STEP that is before the one that is currently selected, “Expanded Transform File from Test” is selected, so I click “Removed Other Columns 1”, and in the formula bar, I change the code from Table.SelectColumns(#"Filtered Hidden Files1", {"Transform File from Test"}) to Table.SelectColumns(#"Filtered Hidden Files1", {"Transform File from Test", "Name"}). This adds the Name column and I see this:

enter image description here

Then I go back to the last APPLIED STEP, which is “Expanded Transform File from Test” and now I see this:

enter image description here

Then I click Transform > Use First Row as Headers and get this:

enter image description here

Then I rename the DE column to Sheet and the Feb-19.xlsx column to File Name.

Then I filter out “Custname” entries from the Custname column. (Click the down arrow at the top of the column and deselect ‘Custname’.) Which yields this:

enter image description here

Then I reordered the columns to get this:

enter image description here

Then I select the Summary Type column and click Transform > Group By, and fill out the pop-up box like this and click OK:

enter image description here

Which yields this (these are your two tables):

enter image description here

So then I right-click on the Main Query in the left pane, and select Reference. That gives me a new query named Main Query (2), with a table that looks just like the last image above. Now I click on the table in Summary (Units) row and get this:

enter image description here

Then I repeat the process for the Summary (Amount): I right-click the Main Query in the left pane, select Reference, and then click on the table in the new query’s Summary (Amount) Row to get this:

enter image description here

Lastly, I rename the two newest queries “Summary (Units)” and “Summary (Amount)”

When you close and load, this will give you three new worksheets. One for each query. If you don’t want a worksheet for the Main Query (If you only want the Summary (Units) and Summary (Amount)) then, after you close and load and are back in Excel, click Data > Show Queries. Then right-click the Main Query in the right pane and click Load To, then select “Only Create Connection” and click Load. Click Continue when you get the data loss warning.

One more last thing: Do not put the Excel Workbook that has this query in it in its source folder, with the files it is getting the information from. Keep it separate.

—Edits to accommodate top rows having transactional information—

I’m adding the following to deal with sheets that might have rows of information above the Summary tables. Here’s what I came up with:

In the answer above, beginning immediately after the step where I said: I select the Data and Item columns, and then Home > Remove Columns (the words, with the drop-down arrow) > Remove Other Columns to get this:

enter image description here

I now add another column (Add Column > Custom Column), and I set it up like this:

enter image description here

This makes a duplicate of the Data column, but adds an index within each of the nested tables, like this:

enter image description here

Then I add another column to determine the index number associated with the start of each summary in each nested table:

enter image description here
(You may want to search for “Summary (” or “Summary (Units)” instead of “Summary”)

Note that it is constructed similarly to the previous column, in that it is basically a duplicate of the Indexed column, only with the Summary Index column added within each nested table.

enter image description here

Then I add another column like this, to determine the index position of the first Summary table’s first line for each nested table:

enter image description here

and get this:

enter image description here

Then I add one more column like this, to remove the top rows that I don’t want within each nested table:

enter image description here

Which gives me this:

enter image description here
(The table selected in this image is the one that had the extra information at the top. That information is gone now.)

Then I select the TopRowsRemoved and Item columns, and then Home > Remove Columns (the words, with the drop-down arrow) > Remove Other Columns to get this:

enter image description here

Then, I click the enter image description here button to expand the tables in the TopRowsRemoved column (instead of the Data column, which we had done before), which brings up this pop-up that looks exactly the same as when we’d used the Data column. Then I clear the checkbox beside “Use original column name as prefix” and click OK.

enter image description here

Then I delete the old Expanded Data step, under APPLIED STEPS in the right hand pane. If I don’t delete the Expanded Data step, I’ll get an error because it’s looking for the Data column, which doesn’t exist. I didn’t use the Data column this time. Instead, I used the TopRowsRemoved column.

At this point, the rest of my previous answer still applies, so refer back to where I wrote —RETURN FROM EDIT AT BOTTOM OF ANSWER TO CONTINUE— above.

Here’s my M code for the “Main Query” query:

let
Source = Folder.Files(FolderPath),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}),
#"Invoke Custom Function1" = Table.AddColumn(#"Removed Other Columns", "Transform File from Test", each #"Transform File from Test"([Content])),
#"Filtered Hidden Files1" = Table.SelectRows(#"Invoke Custom Function1", each [Attributes]?[Hidden]? <> true),
#"Removed Other Columns1" = Table.SelectColumns(#"Filtered Hidden Files1", {"Transform File from Test", "Name"}),
#"Expanded Transform File from Test" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Test", {"Column1", "Column2", "Column3", "Column4", "Column5", "Item", "Custom"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Item", "Custom"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Expanded Transform File from Test", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"CustName", type text}, {"Opening", type any}, {"In", type any}, {"Out", type any}, {"Closing", type any}, {"DE", type text}, {"Summary (Units)", type text}, {"Feb-19.xlsx", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"DE", "Sheet"}, {"Feb-19.xlsx", "File Name"}, {"Summary (Units)", "Summary Type"}}),
#"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([CustName] <> "CustName")),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows",{"File Name", "Sheet", "CustName", "Opening", "In", "Out", "Closing", "Summary Type"}),
#"Grouped Rows" = Table.Group(#"Reordered Columns", {"Summary Type"}, {{"AllData", each _, type table}})
in
#"Grouped Rows"

Here’s my M code for the “Transform Sample File from Test” query, with the changes to accommodate top rows having transactional information:

let
Source = Excel.Workbook(#"Sample File Parameter1", null, true),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Data","Item"}),
#"Added Index" = Table.AddColumn(#"Removed Other Columns", "Indexed", each Table.AddIndexColumn([Data],"Index", 0, 1)),
#"Added Custom1" = Table.AddColumn(#"Added Index", "SummaryIndexed", each Table.AddColumn([Indexed],"Summary Index", each try if Text.StartsWith([Column1],"Summary") then [Index] else null otherwise null)),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "IndexMins", each List.Min([SummaryIndexed][Summary Index])),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "TopRowsRemoved", each Table.RemoveFirstN([SummaryIndexed],[IndexMins])),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom3",{"TopRowsRemoved", "Item"}),
#"Expanded TopRowsRemoved" = Table.ExpandTableColumn(#"Removed Other Columns1", "TopRowsRemoved", {"Column1", "Column2", "Column3", "Column4", "Column5", "Index", "Summary Index"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Index", "Summary Index"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded TopRowsRemoved", each ([Column1] <> null)),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each if Text.StartsWith([Column1],"Summary") then [Column1] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
#"Filtered Rows1" = Table.SelectRows(#"Filled Down", each ([Column1] <> "Summary (Amount)" and [Column1] <> "Summary (Units)"))
in
#"Filtered Rows1"

7

solved Excel Power Query: get data from multiple unfixed files with multiple unfixed sheets