[Solved] Conversion of large .csv file to .prn (around 3.5 GB) in Ubuntu using bash

Question

It’s not clear from your question as you didn’t provide sample input/output we could test against but it SOUNDS like all you’re trying to do is this:

$ cat tst.awk
BEGIN {
    split("7 10 15 12 4",w)
    FPAT="[^,]*|\"[^\"]*\""
}
{
    gsub(/""/,RS)
    for (i=1;i<=NF;i++) {
        gsub(/"/,"",$i)
        gsub(RS,"\"",$i)
        printf "<%-*s>", w[i], substr($i,1,w[i])
    }
    print ""
}

$ cat file
abcde,"ab,c,de","ab ""c"" de","a,""b"",c",ab
abcdefghi,"xyab,c,de","xyzab ""c"" de",abc,abcdefg

$ awk -f tst.awk file
<abcde  ><ab,c,de   ><ab "c" de      ><a,"b",c     ><ab  >
<abcdefg><xyab,c,de ><xyzab "c" de   ><abc         ><abcd>

Obviously I added the < and > around each field just to make it clear where each field starts/ends, you’d remove that for your real application and I’m creating the array w to hold specific widths for each field as idk where you get that from otherwise.

The above uses GNU awk for FPAT, with other awks it’d be a while(match()) loop.

Accepted Answer

It’s not clear from your question as you didn’t provide sample input/output we could test against but it SOUNDS like all you’re trying to do is this:

$ cat tst.awk
BEGIN {
    split("7 10 15 12 4",w)
    FPAT="[^,]*|\"[^\"]*\""
}
{
    gsub(/""/,RS)
    for (i=1;i<=NF;i++) {
        gsub(/"/,"",$i)
        gsub(RS,"\"",$i)
        printf "<%-*s>", w[i], substr($i,1,w[i])
    }
    print ""
}

$ cat file
abcde,"ab,c,de","ab ""c"" de","a,""b"",c",ab
abcdefghi,"xyab,c,de","xyzab ""c"" de",abc,abcdefg

$ awk -f tst.awk file
<abcde  ><ab,c,de   ><ab "c" de      ><a,"b",c     ><ab  >
<abcdefg><xyab,c,de ><xyzab "c" de   ><abc         ><abcd>

Obviously I added the < and > around each field just to make it clear where each field starts/ends, you’d remove that for your real application and I’m creating the array w to hold specific widths for each field as idk where you get that from otherwise.

The above uses GNU awk for FPAT, with other awks it’d be a while(match()) loop.