{"id":5398,"date":"2022-08-28T12:12:06","date_gmt":"2022-08-28T06:42:06","guid":{"rendered":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/"},"modified":"2022-08-28T12:12:06","modified_gmt":"2022-08-28T06:42:06","slug":"solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column","status":"publish","type":"post","link":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/","title":{"rendered":"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"answer-51552602\" class=\"answer js-answer accepted-answer js-accepted-answer\" data-answerid=\"51552602\" data-parentid=\"51546208\" data-score=\"1\" data-position-on-page=\"1\" data-highest-scored=\"1\" data-question-has-accepted-highest-score=\"1\" itemprop=\"acceptedAnswer\" itemscope itemtype=\"https:\/\/schema.org\/Answer\">\n<div class=\"post-layout\">\n<div class=\"votecell post-layout--left\"><\/div>\n<div class=\"answercell post-layout--right\">\n<div class=\"s-prose js-post-body\" itemprop=\"text\">\n<p>The crucial point in OP&#8217;s approach is the staggered aggregation (see the related question row not consolidating duplicates in R when using multiple months in Date Filter). <\/p>\n<p>The OP wants to aggregate data across a number of files which apparently are too large to be loaded altogether and combined into a large data.table. <\/p>\n<p>Instead, each file is read in and aggregated separately. The sub-totals are combined into a data.table from which the overall totals are computed in a second aggregation step.<\/p>\n<p>Now, the OP wants to include sums as well as averages in the aggregation steps. The staggered aggregation works for sums and counts but not for mean, e.g., <code>mean(1:5)<\/code><br \/>\nwhich is 3 is not the same as the mean of the sub-totals <code>mean(1:2)<\/code> and <code>mean(3:5)<\/code>: <code>mean(c(mean(1:2), mean(3:5)))<\/code> which is 2.75.<\/p>\n<p>So, the approach below computes only sums and counts for the first and second aggregation steps and computes the averages for the selected columns separately.<br \/>\nData are taken from OP&#8217;s other question. Furthermore, the <code>by =<\/code> parameter is simplified for demonstration and <code>data.range<\/code> has been adapted to the sample data.<\/p>\n<pre><code>library(data.table, warn.conflicts = FALSE)\nlibrary(magrittr)   ### MODIFIED\n# library(lubridate, warn.conflicts = FALSE)   ### MODIFIED\n\n################\n## PARAMETERS ##\n################\n\n# Set path of major source folder for raw transaction data\nin_directory &lt;- \"Raw Data\"\n\n# List names of sub-folders (currently grouped by first two characters of CUST_ID)\nin_subfolders &lt;- list(\"AA-CA\", \"CB-HZ\", \"IA-IL\", \"IM-KZ\", \"LA-MI\", \"MJ-MS\",\n                      \"MT-NV\", \"NW-OH\", \"OI-PZ\", \"QA-TN\", \"TO-UZ\",\n                      \"VA-WA\", \"WB-ZZ\")\n\n# Set location for output\nout_directory &lt;- \"YTD Master\"\nout_filename &lt;- \"OUTPUT.csv\"\n\n\n# Set beginning and end of date range to be collected - year-month-day format\ndate_range &lt;- c(\"2017-01-01\", \"2017-06-30\")   ### MODIFIED\n\n# Enable or disable filtering of raw files to only grab items bought within certain months to save space.\n# If false, all files will be scanned for unique items, which will take longer and be a larger file.\n# date_filter &lt;- TRUE   ### MODIFIED\n\n\n##########\n## CODE ##\n##########\n\nstarttime &lt;- Sys.time()\n\n# create vector of filenames to be processed\nin_filenames &lt;- list.files(\n  file.path(in_directory, in_subfolders), \n  pattern = \"\\\\.txt$\", \n  full.names = TRUE, \n  recursive = TRUE)\n\n# filter filenames\nselected_in_filenames &lt;- \n  seq(as.Date(date_range[1]), \n      as.Date(date_range[2]), by = \"1 month\") %&gt;% \n  format(\"%Y-%m\") %&gt;% \n  lapply(function(x) stringr::str_subset(in_filenames, x)) %&gt;% \n  unlist()\n\n\n# read and aggregate each file separetely\nmastertable &lt;- rbindlist(\n  lapply(selected_in_filenames, function(fn) {\n    message(\"Processing file: \", fn)\n    temptable &lt;- fread(fn,\n                       colClasses = c(CUSTOMER_TIER = \"character\"),\n                       na.strings = \"\")\n\n    # aggregate file but filtered for date_range\n    temptable[INVOICE_DT %between% date_range, \n              c(.(N = .N), lapply(.SD, sum)), \n              by = .(CUST_ID, \n                     QTR = quarter(INVOICE_DT), YEAR = year(INVOICE_DT)), \n              .SDcols = c(\"Ext Sale\", \"CE100\")] \n  })\n)[\n  # second aggregation overall\n  , lapply(.SD, sum), \n  by = .(CUST_ID, QTR, YEAR), \n  .SDcols = c(\"N\", \"Ext Sale\", \"CE100\")]\n# update mastertable with averages of selected columns\ncols_avg &lt;- c(\"CE100\")\nmastertable[, (cols_avg) := lapply(.SD, function(x) x\/N), \n    .SDcols = cols_avg]\n\n# Save Final table\nprint(\"Saving master table\")\nfwrite(mastertable, file.path(out_directory, out_filename))\n# rm(mastertable)   ### MODIFIED\n\nprint(Sys.time()-starttime)\n\nmastertable\n<\/code><\/pre>\n<blockquote>\n<pre><code>     CUST_ID QTR YEAR N Ext Sale      CE100\n1: AK0010001   1 2017 4  427.803 29.4119358\n2: CO0020001   1 2017 2 1540.300         NA\n3: CO0010001   1 2017 2 -179.765  0.0084625\n<\/code><\/pre>\n<\/blockquote>\n<p>Missing values are being included in the aggregates. It needs to be decided on business side how to handle missing values. In case missing values are to be excluded from aggregation, the staggered computation of averages might become much more complicated.<\/p>\n<\/p><\/div>\n<div class=\"mt24\"><\/div>\n<\/div>\n<p>            <span class=\"d-none\" itemprop=\"commentCount\">1<\/span> <\/p><\/div>\n<\/div>\n<p>[ad_2]<\/p>\n<p>solved I want to summarize by a column and then have it take the sum of 1 column and the mean of another column <\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] The crucial point in OP&#8217;s approach is the staggered aggregation (see the related question row not consolidating duplicates in R when using multiple months in Date Filter). The OP wants to aggregate data across a number of files which apparently are too large to be loaded altogether and combined into a large data.table. Instead, &#8230; <a title=\"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column\" class=\"read-more\" href=\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/\" aria-label=\"More on [Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[320],"tags":[1350,1351,321,1097],"class_list":["post-5398","post","type-post","status-publish","format-standard","hentry","category-solved","tag-data-table","tag-mean","tag-r","tag-sum"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column - JassWeb<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column - JassWeb\" \/>\n<meta property=\"og:description\" content=\"[ad_1] The crucial point in OP&#8217;s approach is the staggered aggregation (see the related question row not consolidating duplicates in R when using multiple months in Date Filter). The OP wants to aggregate data across a number of files which apparently are too large to be loaded altogether and combined into a large data.table. Instead, ... Read more\" \/>\n<meta property=\"og:url\" content=\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/\" \/>\n<meta property=\"og:site_name\" content=\"JassWeb\" \/>\n<meta property=\"article:published_time\" content=\"2022-08-28T06:42:06+00:00\" \/>\n<meta name=\"author\" content=\"Kirat\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kirat\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/\"},\"author\":{\"name\":\"Kirat\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31\"},\"headline\":\"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column\",\"datePublished\":\"2022-08-28T06:42:06+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/\"},\"wordCount\":265,\"publisher\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#organization\"},\"keywords\":[\"data.table\",\"mean\",\"r\",\"sum\"],\"articleSection\":[\"Solved\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/\",\"url\":\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/\",\"name\":\"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column - JassWeb\",\"isPartOf\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#website\"},\"datePublished\":\"2022-08-28T06:42:06+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/jassweb.com\/solved\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/jassweb.com\/solved\/#website\",\"url\":\"https:\/\/jassweb.com\/solved\/\",\"name\":\"JassWeb\",\"description\":\"Build High-quality Websites\",\"publisher\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/jassweb.com\/solved\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/jassweb.com\/solved\/#organization\",\"name\":\"Jass Web\",\"url\":\"https:\/\/jassweb.com\/solved\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png\",\"contentUrl\":\"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png\",\"width\":693,\"height\":132,\"caption\":\"Jass Web\"},\"image\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31\",\"name\":\"Kirat\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1775798750\",\"contentUrl\":\"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1775798750\",\"caption\":\"Kirat\"},\"sameAs\":[\"http:\/\/jassweb.com\"],\"url\":\"https:\/\/jassweb.com\/solved\/author\/jaspritsinghghumangmail-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column - JassWeb","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/","og_locale":"en_US","og_type":"article","og_title":"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column - JassWeb","og_description":"[ad_1] The crucial point in OP&#8217;s approach is the staggered aggregation (see the related question row not consolidating duplicates in R when using multiple months in Date Filter). The OP wants to aggregate data across a number of files which apparently are too large to be loaded altogether and combined into a large data.table. Instead, ... Read more","og_url":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/","og_site_name":"JassWeb","article_published_time":"2022-08-28T06:42:06+00:00","author":"Kirat","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kirat","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/#article","isPartOf":{"@id":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/"},"author":{"name":"Kirat","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31"},"headline":"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column","datePublished":"2022-08-28T06:42:06+00:00","mainEntityOfPage":{"@id":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/"},"wordCount":265,"publisher":{"@id":"https:\/\/jassweb.com\/solved\/#organization"},"keywords":["data.table","mean","r","sum"],"articleSection":["Solved"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/","url":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/","name":"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column - JassWeb","isPartOf":{"@id":"https:\/\/jassweb.com\/solved\/#website"},"datePublished":"2022-08-28T06:42:06+00:00","breadcrumb":{"@id":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/jassweb.com\/solved\/solved-i-want-to-summarize-by-a-column-and-then-have-it-take-the-sum-of-1-column-and-the-mean-of-another-column\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/jassweb.com\/solved\/"},{"@type":"ListItem","position":2,"name":"[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column"}]},{"@type":"WebSite","@id":"https:\/\/jassweb.com\/solved\/#website","url":"https:\/\/jassweb.com\/solved\/","name":"JassWeb","description":"Build High-quality Websites","publisher":{"@id":"https:\/\/jassweb.com\/solved\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/jassweb.com\/solved\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/jassweb.com\/solved\/#organization","name":"Jass Web","url":"https:\/\/jassweb.com\/solved\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/","url":"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png","contentUrl":"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png","width":693,"height":132,"caption":"Jass Web"},"image":{"@id":"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31","name":"Kirat","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/image\/","url":"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1775798750","contentUrl":"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1775798750","caption":"Kirat"},"sameAs":["http:\/\/jassweb.com"],"url":"https:\/\/jassweb.com\/solved\/author\/jaspritsinghghumangmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts\/5398","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/comments?post=5398"}],"version-history":[{"count":0,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts\/5398\/revisions"}],"wp:attachment":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/media?parent=5398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/categories?post=5398"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/tags?post=5398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}