Convert Parquet to CSV Using ClickHouse

Table of Contents

view

Download a Parquet file from S3

To convert a Parquet to CSV, we can use clickhouse-local:

clickhouse-local

Then run the following query:

SELECT *
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/house_parquet/house_0.parquet')
INTO OUTFILE 'house_0.csv'
FORMAT CSV

Or just use one bash command:

clickhouse-local --query="SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/house_parquet/house_0.parquet') INTO OUTFILE 'house_0.csv' FORMAT CSV"

Local Parquet to CSV

Check the parquet file:

❯ file house_0.parquet
house_0.parquet: Apache Parquet

Again, we can use clickhouse-local:

clickhouse-local

Then run the following query:

SELECT *
FROM `house_0.parquet`
INTO OUTFILE 'house_0.csv'
FORMAT CSV

Check the CSV file:

❯ wc -l house_0.csv
2772030 house_0.csv

Or as always, we can use one bash command:

clickhouse-local --query="SELECT * FROM 'house_0.parquet' INTO OUTFILE 'house_0.csv' FORMAT CSV"

Convert CSV to Parquet

Of course, we can also convert a CSV to Parquet:

clickhouse-local --query="SELECT * FROM 'house_0.csv' INTO OUTFILE 'house_1.parquet' FORMAT Parquet"

Check the result:

❯ file house_1.parquet
house_1.parquet: Apache Parquet