[Solved] Convert several YAML files to CSV

Question

I would make use of a Python YAML library to help with doing that. This could be installed using:

pip install pyyaml

The files you have given could then be converted to CSV as follows:

import csv
import yaml

fieldnames = ['Name', 'Author', 'Written', 'About', 'Body']

with open('output.csv', 'w', newline="") as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=fieldnames)
    csv_output.writeheader()

    for filename in ['Test_Co_1.txt', 'Test_Co_2.txt']:
        with open(filename) as f_input:
            data = yaml.safe_load(f_input)

        name = data[0]['Name']

        for entry in data:
            key = next(iter(entry))

            if key.startswith('Note') or key.startswith('Comment'):
                row = {'Name' : name}

                for d in entry[key]:
                    for get in ['Author', 'Written', 'About', 'Body']:
                        try:
                            row[get] = d[get]
                        except KeyError as e:
                            pass

                csv_output.writerow(row)

This assumes a standard CSV format (i.e. commas between fields and quotes are used if a field contains a newline or commas).

To understand this, I would recommend you add some print statements to see what things look like. For example data holds then entire file contents in a format of lists and dictionaries. It is then a case of extracting the bits you need.

To apply this to all of your YAML files, I would replacing the filenames with a call to glob.glob('*.txt')

Accepted Answer

I would make use of a Python YAML library to help with doing that. This could be installed using:

pip install pyyaml

The files you have given could then be converted to CSV as follows:

import csv
import yaml

fieldnames = ['Name', 'Author', 'Written', 'About', 'Body']

with open('output.csv', 'w', newline="") as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=fieldnames)
    csv_output.writeheader()

    for filename in ['Test_Co_1.txt', 'Test_Co_2.txt']:
        with open(filename) as f_input:
            data = yaml.safe_load(f_input)

        name = data[0]['Name']

        for entry in data:
            key = next(iter(entry))

            if key.startswith('Note') or key.startswith('Comment'):
                row = {'Name' : name}

                for d in entry[key]:
                    for get in ['Author', 'Written', 'About', 'Body']:
                        try:
                            row[get] = d[get]
                        except KeyError as e:
                            pass

                csv_output.writerow(row)

This assumes a standard CSV format (i.e. commas between fields and quotes are used if a field contains a newline or commas).

To understand this, I would recommend you add some print statements to see what things look like. For example data holds then entire file contents in a format of lists and dictionaries. It is then a case of extracting the bits you need.

To apply this to all of your YAML files, I would replacing the filenames with a call to glob.glob('*.txt')