Browse > Home / Archive: May 2008

| Subcribe via RSS

Loading a byte-length delimited text file in MySQL and CF

May 10th, 2008 | No Comments | Posted in ColdFusion, MySQL, Technology

I’ve had a few projects that require loading large text files with byte-length fields into MySQL. That means that there are no delimiters for the fields, they’re just always a certain number of characters. As far as I can tell, the MySQL tools need field delimiters in order to use load_data_infile or the import tool. Therefore my current solution was to read each line, parse it, insert it, and repeat. In each case I also had to do some transformations on the data before it was final, like converting ISBN from short to long formats.
I’ve always know this was slow, particularly with 500,000 records in a file to parse. I’m thinking of two alternatives, and I’m going to test to see which is faster. One would be mysqlimport the file into a temp table where each line is a single field. Then I would wither use CF or SQL to parse the longer fields one at a time. I’m not sure that this removes the multiple insert bottleneck, I expect it wouldn’t be much better. The second would be to read and parse each line and then write it back to a text file with proper delimiters and all the data converted. Then I’d mysqlimport the text file. The question here is which is faster – multiple inserts or appending lines to a large text file?
Update – I figured loading the file would be faster than all those queries, but the difference was pretty stark. I thought the benefit of the bulk import might be offset by writing so many lines to a text file. Seems not so much. The old way took over 90 seconds on 66,000 lines. The append-bulk import method took about 9 seconds. Now I get to rewrite a few import routines.

  Last modified: June 6, 2008 @ 6:40 am