{"id":929,"date":"2016-10-06T23:58:43","date_gmt":"2016-10-06T21:58:43","guid":{"rendered":"http:\/\/www.netexpertise.eu\/en\/?p=929"},"modified":"2016-10-06T23:58:43","modified_gmt":"2016-10-06T21:58:43","slug":"processing-csv-files-with-perl-and-bash","status":"publish","type":"post","link":"http:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html","title":{"rendered":"Processing CSV Files with Perl and Bash"},"content":{"rendered":"<div style=\"float:left; margin-right:15px; margin-top:0px;\">\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script><br \/>\n<!-- 200x200, Netexpertise --><br \/>\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:200px;height:200px\"\n     data-ad-client=\"ca-pub-6495750100906580\"\n     data-ad-slot=\"1946825373\"><\/ins><br \/>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n<\/div>\n<p>Olivier, a friend of mine, had to parse a CSV file and took the opportunity to benchmark the performance of 3 programming languages.<br \/>\n&nbsp;<br \/>\nThe file contains server names and disks he needs to add up into a hash table in order to get the total disk space for each server. He assumes on his <a href=\"https:\/\/blog.owulveryck.info\/2016\/09\/11\/processing-csv-files-with-golang-python-and-perl\/index.html\">blog<\/a> Perl, Python and Golang are much faster than Bash. He is definitely right but, how much faster?<br \/>\n&nbsp;<br \/>\nThe following (slightly modified) Perl script processed 600k lines in less than a second. Not bad knowing Perl is an interpreted language.<\/p>\n<pre>\r\n#!\/usr\/bin\/perl\r\nmy $file = 'sample.csv';\r\nmy %data;\r\nopen(my $fh, '<', $file) or die \"Can't read file '$file' [$!]\\n\";\r\nwhile ( my ($server,$value)=split(\/,\/,<$fh>)) {\r\n    $data{$server} += $value;\r\n}\r\nclose ($file);\r\n<\/pre>\n<p>&nbsp;<br \/>\nNow, here&#8217;s a similar code in bash<\/p>\n<pre>\r\n#!\/bin\/bash\r\nfile=sample.csv\r\ndeclare -A data\r\nwhile read -r line; do\r\n  values=($(echo $line|awk -F, '{print $1\" \"$2}'))\r\n  (( data[${values[0]}] += ${values[1]} ))\r\ndone < \"$file\"\r\n<\/pre>\n<p>The file was processed in over 19 minutes, or in other words, around 1200 times slower!<br \/>\n&nbsp;<br \/>\nLet's see if we can improve the script's performance.<br \/>\nThe read command man page states something of interest:<br \/>\n\"The characters in IFS are used to split the line into words\".<br \/>\nSetting comma as the default separator allows to build the $line variable as an array, saving the hassle of parsing each line and using a temporary variable.<\/p>\n<pre>\r\n#!\/bin\/bash\r\nIFS=','\r\nfile=sample.csv\r\ndeclare -A data\r\nwhile read -a line; do\r\n  (( data[${line[0]}] += ${line[1]} ))\r\ndone < \"$file\"\r\n<\/pre>\n<p>This new version runs in the smooth time of... 17s! This is 17 times slower than Perl, but 70 times faster than the original version.<br \/>\n&nbsp;<br \/>\nNo doubt Perl and Python are much faster than the shell family languages, but one needs to pay attention to small details when it comes to performance issues.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Olivier, a friend of mine, had to parse a CSV file and took the opportunity to benchmark the performance of 3 programming languages. &nbsp; The file contains server names and disks he needs to add up into a hash table in order to get the total disk space for each server. He assumes on his [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[11],"tags":[280,386,27,104,181],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.8.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Netexpertise - Processing CSV Files with Perl and Bash<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Netexpertise - Processing CSV Files with Perl and Bash\" \/>\n<meta property=\"og:description\" content=\"Olivier, a friend of mine, had to parse a CSV file and took the opportunity to benchmark the performance of 3 programming languages. &nbsp; The file contains server names and disks he needs to add up into a hash table in order to get the total disk space for each server. He assumes on his [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html\" \/>\n<meta property=\"og:site_name\" content=\"Netexpertise\" \/>\n<meta property=\"article:published_time\" content=\"2016-10-06T21:58:43+00:00\" \/>\n<meta name=\"author\" content=\"dave\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@netexpertise\" \/>\n<meta name=\"twitter:site\" content=\"@netexpertise\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html\",\"url\":\"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html\",\"name\":\"Netexpertise - Processing CSV Files with Perl and Bash\",\"isPartOf\":{\"@id\":\"http:\/\/www.netexpertise.eu\/en\/#website\"},\"datePublished\":\"2016-10-06T21:58:43+00:00\",\"dateModified\":\"2016-10-06T21:58:43+00:00\",\"author\":{\"@id\":\"http:\/\/www.netexpertise.eu\/en\/#\/schema\/person\/cb4cd666549d22e9070ec1cfc1a496fa\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\/\/www.netexpertise.eu\/en\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Processing CSV Files with Perl and Bash\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\/\/www.netexpertise.eu\/en\/#website\",\"url\":\"http:\/\/www.netexpertise.eu\/en\/\",\"name\":\"Netexpertise\",\"description\":\"Systems \/ Networks \/ DevOps\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\/\/www.netexpertise.eu\/en\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"http:\/\/www.netexpertise.eu\/en\/#\/schema\/person\/cb4cd666549d22e9070ec1cfc1a496fa\",\"name\":\"dave\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\/\/www.netexpertise.eu\/en\/#\/schema\/person\/image\/\",\"url\":\"http:\/\/1.gravatar.com\/avatar\/1129916e1f4955bd632f27f836f64e55?s=96&d=mm&r=g\",\"contentUrl\":\"http:\/\/1.gravatar.com\/avatar\/1129916e1f4955bd632f27f836f64e55?s=96&d=mm&r=g\",\"caption\":\"dave\"},\"sameAs\":[\"http:\/\/www.netexpertise.eu\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Netexpertise - Processing CSV Files with Perl and Bash","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html","og_locale":"en_US","og_type":"article","og_title":"Netexpertise - Processing CSV Files with Perl and Bash","og_description":"Olivier, a friend of mine, had to parse a CSV file and took the opportunity to benchmark the performance of 3 programming languages. &nbsp; The file contains server names and disks he needs to add up into a hash table in order to get the total disk space for each server. He assumes on his [&hellip;]","og_url":"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html","og_site_name":"Netexpertise","article_published_time":"2016-10-06T21:58:43+00:00","author":"dave","twitter_card":"summary_large_image","twitter_creator":"@netexpertise","twitter_site":"@netexpertise","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html","url":"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html","name":"Netexpertise - Processing CSV Files with Perl and Bash","isPartOf":{"@id":"http:\/\/www.netexpertise.eu\/en\/#website"},"datePublished":"2016-10-06T21:58:43+00:00","dateModified":"2016-10-06T21:58:43+00:00","author":{"@id":"http:\/\/www.netexpertise.eu\/en\/#\/schema\/person\/cb4cd666549d22e9070ec1cfc1a496fa"},"breadcrumb":{"@id":"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.netexpertise.eu\/en\/systems\/linux\/processing-csv-files-with-perl-and-bash.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/www.netexpertise.eu\/en"},{"@type":"ListItem","position":2,"name":"Processing CSV Files with Perl and Bash"}]},{"@type":"WebSite","@id":"http:\/\/www.netexpertise.eu\/en\/#website","url":"http:\/\/www.netexpertise.eu\/en\/","name":"Netexpertise","description":"Systems \/ Networks \/ DevOps","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/www.netexpertise.eu\/en\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"http:\/\/www.netexpertise.eu\/en\/#\/schema\/person\/cb4cd666549d22e9070ec1cfc1a496fa","name":"dave","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/www.netexpertise.eu\/en\/#\/schema\/person\/image\/","url":"http:\/\/1.gravatar.com\/avatar\/1129916e1f4955bd632f27f836f64e55?s=96&d=mm&r=g","contentUrl":"http:\/\/1.gravatar.com\/avatar\/1129916e1f4955bd632f27f836f64e55?s=96&d=mm&r=g","caption":"dave"},"sameAs":["http:\/\/www.netexpertise.eu"]}]}},"_links":{"self":[{"href":"http:\/\/www.netexpertise.eu\/en\/wp-json\/wp\/v2\/posts\/929"}],"collection":[{"href":"http:\/\/www.netexpertise.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.netexpertise.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.netexpertise.eu\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.netexpertise.eu\/en\/wp-json\/wp\/v2\/comments?post=929"}],"version-history":[{"count":0,"href":"http:\/\/www.netexpertise.eu\/en\/wp-json\/wp\/v2\/posts\/929\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.netexpertise.eu\/en\/wp-json\/wp\/v2\/media?parent=929"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.netexpertise.eu\/en\/wp-json\/wp\/v2\/categories?post=929"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.netexpertise.eu\/en\/wp-json\/wp\/v2\/tags?post=929"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}