Convert Parquet to CSV Using ClickHouse
Table of Contents
Download a Parquet file from S3
To convert a Parquet to CSV, we can use clickhouse-local:
clickhouse-local
Then run the following query:
SELECT *
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/house_parquet/house_0.parquet')
INTO OUTFILE 'house_0.csv'
FORMAT CSV
Or just use one bash command:
clickhouse-local --query="SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/house_parquet/house_0.parquet') INTO OUTFILE 'house_0.csv' FORMAT CSV"
Local Parquet to CSV
Check the parquet file:
❯ file house_0.parquet
house_0.parquet: Apache Parquet
Again, we can use clickhouse-local:
clickhouse-local
Then run the following query:
SELECT *
FROM `house_0.parquet`
INTO OUTFILE 'house_0.csv'
FORMAT CSV
Check the CSV file:
❯ wc -l house_0.csv
2772030 house_0.csv
Or as always, we can use one bash command:
clickhouse-local --query="SELECT * FROM 'house_0.parquet' INTO OUTFILE 'house_0.csv' FORMAT CSV"
Convert CSV to Parquet
Of course, we can also convert a CSV to Parquet:
clickhouse-local --query="SELECT * FROM 'house_0.csv' INTO OUTFILE 'house_1.parquet' FORMAT Parquet"
Check the result:
❯ file house_1.parquet
house_1.parquet: Apache Parquet