[problem]
I wanted to upload a delimited field to mysql db, but hit the problem that the file contained nearly 3000 rows and no schema on the required size of each field.
Therefore I needed to traverse the file and calculate the length of each field. Then at the end, print the largest field found for each column.
[/problem]
[solution]
Perl to the rescue! Pretty easy in Perl, after scratching my head attempting with awk. 🙂
Anyway the code is under the example tab. I’m using the pipe symbol as a delimiter “|” – so just substitute this with your delimiter, cat your file and pipe it through this script.
In the example I show how you can manipulate the file, to produce pipe delimited fields too.
[/solution]
[example]
This is how to run the script. Basically you just need to pipe your output through find largest.
cat yourfile | ./find_largest.pl
Here is the code.
#!/usr/bin/perl@highest=();
while( < STDIN > ) {
@thisline=split(/|/);
for($i=0;$i<=$#thisline;$i++) {
$thislength=length($thisline[$i]);
if($thislength > $highest[$i]) { $highest[$i]=$thislength; }
}
}
print(join("|",@highest)."n");
exit(0);
__END__
So for example, if I want to find the largest fields in one of my web logs - for crunching into a db:
- First off I only want lines starting with a space, then a number.
- Next I need to replace all multiple spaces between fields, with 1 space.
- Then I replace the spaces between each field, with a pipe.
- Lastly I pump it through find_largest.pl - which gives me the largest sized field
[marcus@bree]/var/log/httpd% grep "^ [0-9]" access_log.tools
| sed -e 's/ / /g' -e 's/ /|/g' |
~/Perl_Bin/find_largest.pl
|14|1|1|21|6|5|103|9|3|5
[/example]
[reference]
[tags]Perl, Data Analysis, Perl Coding School[/tags]
[/reference]
If you have found my website useful, please consider buying me a coffee below 😉