Quick import huge size CSV file to MySQL

2019-10-09

We have many ways to import CSV data to MySQL, and can find any infomation about it. The previous article is about exporting CSV file from huge data with php. This article may be opposite of that.

However, it does not exist only one answer because CSV to DB importing is affected by various things.
The following ignores the “upload_max_filesize" on servers and available memory size.

How to import CSV to DB

The following are some ways of importing CSV.

  1. INSERT one row by row :Extremely slow
  2. BULK INSERT :fast
  3. BATCH INSERT :slow
  4. LOAD DATA LOCAL INFILE INSERT : fastest

Though I use Laravel of PHP, speed of Insert is related to MySQL DB, so any program language or any framework can put same result.

Insert row by row

It is quite natural that this way is slowest way of importing. It get a row data from CSV file, then excute one insert query for one record.

$file = new SplFileObject($path);
$file->setFlags(SplFileObject::READ_CSV);

foreach ($file as $row => $line) {
	Hoge::create([
           'huga'=> $line[0],
        ]);
}

We shouldn’t take this way.
I tried to import CSV file that has over 300 thousand record to MySQL by this way, but it caused Request Timeout error because of taking over 5 min.

BULK INSERT

BULK INSERT is a way to insert many data by one query.

INSERT INTO hoge values('test1'), ('test2'), ('test3');

It’s nothing special that this way is fast. And it doesn’t need a transaction because of inserting data by once.

$csvData = [
['hoge','foo',...]
...
]
DB::table('hoge').insert($csvData);

At first glance, it seems that this is best way, but this also have a problem.

In the case of MySQL, there is a limit of query size named “max_allowed_packet” that needs to fit within 16MB.

https://qiita.com/_am_/items/91824da643256d46b847

SQL Query size that I tried to import my CSV exceeding over 300,000 rows was over 16MB.

When it inserts huge and undefined number of data to MySQL, the following BATCH INSERT is a better and general technique.

BATCH INSERT

Insert row by row is extremely slow, and Bulk Insert has a problem that query data size is limited. So Batch Insert is a generally better way. For example, insert once in 2000 records.

$file = new SplFileObject($path);
$file->setFlags(SplFileObject::READ_CSV);
$cnt = 0;
$insertDate = [];
foreach ($file as $row => $line) {
	$cnt++;
	$insertDate[] = [
       'huga'=> $line[0],
    ];
	if ($cnt >= 2000) {
		DB::table('hoge').insert($insertDate);
		$cnt = 0;
		$insertDate = [];
	}
}

No matter how huge size data It can insert correctly by this way. And it can get a certain level of speed. So Batch Insert is a better way.

But it took over 2 minutes that was completed to insert my CSV data over 300,000 rows.

LOAD DATA LOCAL INFILE

If execute time is unacceptable with batch insert, it should consider “LOAD DATA LOCAL INFILE". This is a way that inserts CSV file directly to MySQL.

“LOAD DATA LOCAL INFILE" is faster than BULK INSERT and has no “max_allowed_packet" that limits data size. It seems that this is the best way to insert huge data.

$path = "test.csv";
$sql = <<< EOM
LOAD DATA LOCAL INFILE '$path' REPLACE INTO TABLE hoge 
CHARACTER SET utf8 FIELDS TERMINATED BY ',' 
OPTIONALLY ENCLOSED BY '"' 
IGNORE 1 LINES;
EOM;
DB::insert($sql, []);

This way takes only 2 seconds that inserting 300,000 rows data. It’s an awesome speed! Because it operates CSV file itself for inserting like bulk insert.

Of course, it may be impossible that you insert CSV file extremely huge size, for example, 100GB, with the way. Because of out of memory on a server.

So Batch Insert is a better way if you need to insert huge data and the data size is unpredictable. On the other hand, if you need to insert quickly huge CSV file, “LOAD DATA LOCAL INFILE" is helpful.