[Solved] Python. Parse csv file. Append value to previous row if next row have duplicate value [closed]

Question

This can be done by simply parsing the file line by line and checking if the current asset is the same as the previous one.

First, let us store the assets and tags in a list inside transformed_data, to access tags easily, like so:

[ [ asset1, tag1 ], [ asset2, tag2 ], ... ]

Note that I assume the file contains only asset and tag in each row.

# Some constants to improve readability
ASSET_FIELD = 0
TAG_FIELD = 1
# Open the file to parse
with open('data.csv') as csv_file:
    transformed_data = list()
    # Skip the headers
    for line in csv_file.readlines()[1:]:
        # Extract the asset and its tag
        asset, tag = line.split()
        # if asset is same as last asset of transformed data, ie the previous asset read
        if transformed_data and transformed_data[-1][ASSET_FIELD] == asset:
            # Append to previous tag
            transformed_data[-1][TAG_FIELD] += ', ' + tag
        # Else, simply append it
        else:
            transformed_data.append([asset, tag])

And this gives:

[['Laptop,', '856231'], ['Desktop,', '665786, 125548'], ['Laptop,', '657843']]

Now, if we want, we can convert it back to a list of strings:

# Join each row into a string
transformed_data = [ ' '.join(row) for row in transformed_data]
print(transformed_data)

And, this shows:

['Laptop, 856231', 'Desktop, 665786, 125548', 'Laptop, 657843']

You can do whatever you want with it, and even write it back to the file. Remember to reattach the headers!

Edit: If you are getting \n in the strings, simply do:

# Join each row into a string
transformed_data = [ ' '.join(row).replace('\n','') for row in transformed_data]
print(transformed_data)

Accepted Answer