[Solved] Process unstructured and multiple line CSV in hadoop

Because you’re coping with multi-line data you cannot use a simple TextInputFormat to access your data. Thus you need to use a custom InputFormat for CSV files. Currently there is no built-in way of processing multi-line CSV files in Hadoop (see https://issues.apache.org/jira/browse/MAPREDUCE-2208), but luckily there’s come code on github you can try: https://github.com/mvallebr/CSVInputFormat. As far … Read more