td-spark API Documentation

td-spark is a library for reading and writing tables in Treasure Data through the DataFrame API of Apache Spark. For Python users, td-pyspark PyPI package for PySpark is available.

Features

Important: Accessing Treasure Data using td-spark is disabled by default. First contact support@treasure-data.com to enable this feature.

td-spark supports:

Reading and writing tables in TD through DataFrames of Spark.
Running Spark SQL queries against DataFrames.
Submitting Presto/Hive SQL queries to TD and reading the query results as DataFrames.
If you are using PySpark, you can use both Spark’s DataFrames and Pandas DataFrames interchangeably.
Using any hosted Spark services, such as Amazon EMR, Databrics Cloud. - It’s also possible to use Google Colaboratory to run PySpark.

Download & Release Notes

Download
- For Spark (Scala)
- For PySpark (Python)
Release Notes
- v24.4.1
- v24.4.0
- v24.2.0
- v22.7.1
- v22.7.0
- v22.6.1
- v22.6.0
- v21.10.0
- v21.5.0
- v21.3.0
- v20.12.0
- v20.10.0
- v20.6.2
- v20.6.1
- v20.6.0
- v20.4.0
- v20.2.0
- v19.11.1
- v19.11.0
- v19.7.0

Previous Release Notes

td-pyspark (Python)