Calculate largest field big data file


I wanted to upload a delimited field to mysql db, but hit the problem that the file contained nearly 3000 rows and no schema on the required size of each field.

Therefore I needed to traverse the file and calculate the length of each field. Then at the end, print the largest field found for each column.



Perl to the rescue! Pretty easy in Perl, after scratching my head attempting with awk. 🙂

Anyway the code is under the example tab. I’m using the pipe symbol as a delimiter “|” – so just substitute this with your delimiter, cat your file and pipe it through this script.

In the example I show how you can manipulate the file, to produce pipe delimited fields too.



This is how to run the script. Basically you just need to pipe your output through find largest.

cat yourfile | ./

Here is the code.



while( < STDIN > ) {


for($i=0;$i<=$#thisline;$i++) {


if($thislength > $highest[$i]) { $highest[$i]=$thislength; }







So for example, if I want to find the largest fields in one of my web logs - for crunching into a db:

  • First off I only want lines starting with a space, then a number.
  • Next I need to replace all multiple spaces between fields, with 1 space.
  • Then I replace the spaces between each field, with a pipe.
  • Lastly I pump it through - which gives me the largest sized field

[marcus@bree]/var/log/httpd% grep "^ [0-9]"
| sed -e 's/ / /g' -e 's/ /|/g' |



[tags]Perl, Data Analysis, Perl Coding School[/tags]


If you have found my website useful, please consider buying me a coffee below 😉

Leave a Reply

Your email address will not be published. Required fields are marked *