[Solved] delete rows by date and add file name column for multiple csv


This function should be very close to what you want. I’ve tested it in both Python 2.7.9 and 3.4.2. The initial version I posted had some problems because — as I mention then — it was untested. I’m not sure if you’re using Python 2 or 3, but this worked properly in either one.

Another change from the previous version is that the optional keyword date argument’s name had been changed from cutoff_date to start_date to better reflect what it is. A cutoff date usually means the last date on which it is possible to do something—the opposite of the way you used it in your question. Also note that any date provided should a string, i.e. start_date="20140401", not as an integer.

One enhancement is that it will now create the output directory if one is specified but doesn’t already exist.

import csv
import os
import sys

def open_csv(filename, mode="r"):
    """ Open a csv file in proper mode depending on Python verion. """
    return (open(filename, mode=mode+'b') if sys.version_info[0] == 2 else
            open(filename, mode=mode, newline=""))

def process_file(filename, start_date=None, new_dir=None):
    # Read the entire contents of the file into memory skipping rows before
    # any start_date given (assuming row[0] is a date column).
    with open_csv(filename, 'r') as f:
        reader = csv.reader(f)
        header = next(reader)  # Save first row.
        contents = [row for row in reader if start_date and row[0] >= start_date
                                                or not start_date]

    # Create different output file path if new_dir was specified.
    basename = os.path.basename(filename)  # Remove dir name from filename.
    output_filename = os.path.join(new_dir, basename) if new_dir else filename
    if new_dir and not os.path.isdir(new_dir):  # Create directory if necessary.
        os.makedirs(new_dir)

    # Open the output file and create a CSV writer for it.
    with open_csv(output_filename, 'w') as f:
        writer = csv.writer(f)

        # Add name of new column to header.
        header = ['Pipe'] + header  # Prepend new column name.
        writer.writerow(header)

        # Data for new column is the base filename without extension.
        new_column = [os.path.splitext( os.path.split(basename)[1] )[0]]

        # Process each row of the body by prepending data for new column to it.
        writer.writerows((new_column+row for row in contents))

8

solved delete rows by date and add file name column for multiple csv