Apr 20, 2016
Table of contents:
When it comes to transferring data around between applications, there are basically two methods that you come across again and again.
If the application has an API you might be able to work with JSON or XML. This usually makes working with data from other applications fairly easy.
However, often the only way to get data in or out of an application is via CSV. CSVs are a great way of dumping data, or opening that data in Excel, but it can be a bit of a pain to work with in code.
Fortunately Ruby has a CSV
class as part of the standard library. I find myself using CSVs all of the time, so having native CSV support is really great!
In today’s tutorial we will be looking at working with CSVs in Ruby.
One of the first things you will likely want to do is to open an existing CSV file. The easiest way is to simply use the read
method and pass it the path to a file:
rows = CSV.read('path/to/file.csv')
This will read the CSV file into an array of arrays so you can work with the data.
Reading a CSV all in one go is often possible with small CSV files. However, once a CSV starts to get pretty big, you don’t want to be reading the whole thing at once.
Instead you can simply read each line of the CSV, one at a time:
CSV.foreach ('path/to/file.csv') { |row|
# do something
}
Each row of the CSV will be passed to the block so you can do what you need to do with it. Reading a big file line by line is a much better way of working with massive files because it means you won’t run out of memory.
Whenever I’m working with a CSV I will always read it line by line, no matter how big the file is. I just find it easier to always use the same method.
To make working with CSVs easier, the foreach
method takes a second optional hash of options.
The two options I use the most often are headers
and header_converters
:
CSV.foreach (
'path/to/file.csv',
headers: true,
header_converters: :symbol
) do |row|
User.find_or_create_by(email: row[:email_address]) do |user|
user.username = row[:username]
user.email = row[:email_address]
end
end
The headers
option will treat the first row of the CSV as the headers of the data, rather than just another row.
And the header_converters
option will allow you to access a particular column from a row by the header name. In the example above I’ve passed :symbol
and so I can access the columns from a row by passing the header as a symbol What are Symbols in Ruby?.
If you allow your users to export their data, CSV is usually a safe option. A lot of non-technical users will not know what to do with JSON or XML, but a lot of non-technical people will recognise that they can work with their data in Excel if your provide them with a CSV.
The easiest way of saving data to a CSV is like this:
CSV.open('path/to/file.csv', 'wb') do |csv|
csv << %w[row of CSV data]
csv << %w[another row]
end
Inside the block you simply need to add each row of data to the file.
Having a native way to work with CSVs is an excellent addition to the Ruby Standard Library. I find myself working with CSVs all of the time, and so not having to rely on yet another third-party gem makes working with CSVs a piece of cake!
I’ve covered the use cases that I find myself using the most often in this tutorial, but the CSV class actually has a number of other options you might find useful.
To read more about using the CSV class in Ruby, I would recommend you take a look at the documentation.