I am trying to parse a series of text files and save them as CSV files using Python (2.7.3). All text files have a 4-line header that needs to be stripped out. The data lines have various delimiters, including " (quote)
, - (dash)
, : (colon)
, and blank spaces. I found it difficult to handle these different delimiters in C++, so I decided to try Python as it seemed easier.
I wrote some code to test parsing a single line of data, and it works fine. However, I couldn’t make it work for the entire file. I was using the replace
method on a text string for a single line, but my current implementation reads the text file as a list, and the replace
method is not available for list objects.
I am new to Python and got stuck here. Can anyone help me resolve this?
Thanks!
Code for Parsing:
# function for parsing the data
def data_parser(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
# open input/output files
inputfile = open('test.dat')
outputfile = open('test.csv', 'w')
my_text = inputfile.readlines()[4:] # reads the whole text file, skipping the first 4 lines
# sample text string, just for demonstration to show how the data looks
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'
# dictionary definition to handle the date block delimited by dashes and ensure negative numbers are not affected
reps = {'"NAN"': 'NAN', '"': '', '0-': '0,', '1-': '1,', '2-': '2,', '3-': '3,', '4-': '4,', '5-': '5,', '6-': '6,', '7-': '7,', '8-': '8,', '9-': '9,', ' ': ',', ':': ','}
txt = data_parser(my_text, reps)
outputfile.writelines(txt)
inputfile.close()
outputfile.close()