Utility to manipulate long csv files
Manipulating long csv files using graphic user interfaces might not be the most efficient at times. I want an utility that helps me manipulate long csv files. This utility should:
Glossary of terms
- Column: header column name of a csv file
- Index: line number of a csv file
- Row: line of a csv file
npm install --global csv-tr
using yarn
yarn add global csv-tr
using npx
npx csv-tr --help
Usage: csv-tr [options] [source | input stream]
transforms given csv file or stream and outputs the result
Options:
-V, --version output the version number
-o, --only <columns> output only specified columns (comma separated). Not to be used with --exclude.
-e, --exclude <columns> exclude specified columns (comma separated). Not to be used with --only.
-t, --transform <js-file|js-expression> transform rows by given JavaScript expression. Ej: -t "row.email = row.email.toLowerCase()"
-f, --filter <js-file|js-expression> filter rows by given JavaScript file or expression. Ej: -f "row.state === 'FL'"
-s, --sort <[sort-column]:[sort-order: 1=ASC | -1=DESC]> sorts rows by column:order (comma separated) Ej: -s "firstName:1,lastName:-1"
-h, --help display help for command
Imagine a CSV file called contacts.csv
with content
name,email,state
Juan,[email protected],FL
Miguel,[email protected],NY
Jesus,[email protected],NY
Filtering values using the option --filter
or -f
followed by a JavaScript expression that will be evaluated
against a function that must return boolean, and looks like:
(row, index) => { return /* your JavaScript expression */ }
Alternatively a file that exports a function with the same signature is also accepted:
// my-filter-file
module.exports = (row/* , index */) => {
return /@gmail.com$/i.test(row.email)
}
Each row
is nothing but a line of the csv file represented as a JSON object streamed by
csv-parser.
The index
value is given row number starting at 0
. Meaning index
of the first row (which is not the column names header)
equals 0
.
csv-tr contacts.csv -f '/@gmail.com$/i.test(row.email)' > gmail-contacts.csv
gmail-contacts.csv
would look like:
name,email,state
Juan,[email protected],FL
Jesus,[email protected],NY
Slicing a given range of rows using the index
value
csv-tr contacts.csv -f 'index > 0 && index < 2'
Would output:
name,email,state
Miguel,[email protected],NY
Transforming values using the option --transform
or -t
followed by a JavaScript expression that will be evaluated
against a function that looks like.
(row, index) => {
/* your js mutations go here */
}
Alternatively a file that exports a function with the same signature is also accepted:
// my-transform-file
module.exports = (row/* , index */) => {
row.name = row.name.toUpperCase()
row.email = row.email.toUpperCase()
row.initial = row.name[0]
return row
}
Using the same contacts.csv
input sample.
csv-tr contacts.csv -t 'row.name = row.name.toUpperCase(); row.email = row.email.toUpperCase(); row.initial = row.name[0]' > contacts-uppercase.csv
contacts-uppercase.csv
would look like:
name,email,state,initial
JUAN,[email protected],FL,J
MIGUEL,[email protected],NY,M
JESUS,[email protected],NY,J
Using the same contacts.csv
input sample.
csv-tr contacts.csv -o email,state > contact-email-state.csv
contact-email-state.csv
would look like:
email,state
[email protected],FL
[email protected],NY
[email protected],NY
Using the same contacts.csv
input sample.
csv-tr contacts.csv -e email,state > contact-names.csv
contact-names.csv
would look like:
name
Juan
Miguel
Jesus
Using the same contacts.csv
input sample, imagine sorting by state
-> DESC
and name
-> ASC
:
Sorting will guess numeric and date kind of values, and treat them accordingly.
csv-tr contacts.csv -s state:-1,name:1
Would output:
name,email,state
Jesus,[email protected],NY
Miguel,[email protected],NY
Juan,[email protected],FL
Using all above's examples at once with the same contacts.csv
input sample
const fs = require('fs');
const { csvTr, sort, csvStringify } = require('csv-tr');
// un-comment any or multiple of the options below, run it and then take a look at result.csv
csvTr(fs.createReadStream('./tests/contacts.csv'), {
// filter: (row, index) => { return /@gmail.com$/i.test(row.email) },
// transform: (row, index) => { row.name = row.name.toUpperCase(); row.email = row.email.toUpperCase(); return row },
// only: ['email', 'state'],
// exclude: ['email', 'state'],
}).pipe(csvStringify()).pipe(fs.createWriteStream('result.csv'))
// SORTING
// mind sorting buffers all rows
const csvStreamToSort = csvTr(fs.createReadStream('contacts.csv'))
sort(csvStreamToSort, { state: -1, name: 1 }).then(sortedStream => {
sortedStream.pipe(csvStringify()).pipe(fs.createWriteStream('result-sorted.csv'))
})
result-sorted.csv
would look like:
name,email,state
Jesus,[email protected],NY
Miguel,[email protected],NY
Juan,[email protected],FL
© 2021 Martin Rafael [email protected]