Download
td-spark uses YY.MM.patch
versioning scheme to show the release year, month, and the patch update number. Note: td-spark-assembly-latest_xxxx.jar
is an alias to the last release version td-spark-assembly-YY.MM.patch_(spark version)
.
WARNING: Spark 2.4.x + Scala 2.11 support has been deprecated since December 2020. Consider migrating to Spark 3.x + Scala 2.12.
For Spark (Scala)
td-spark is a library that can be used with your own Spark cluster. Download one of the jar files below and specify the file path as an argument of spark-submit command --jars (path to td-spark-assembly-xxx.jar)
:
td-spark-assembly-latest_spark3.5.1.jar (Spark 3.5.1, Scala 2.13)
td-spark-assembly-latest_spark3.4.2.jar (Spark 3.4.2, Scala 2.12)
td-spark-assembly-latest_spark3.3.0.jar (Spark 3.3.0, Scala 2.12)
td-spark-assembly-latest_spark3.2.1.jar (Spark 3.2.1, Scala 2.12)
td-spark-assembly-latest_spark2.4.7.jar (Spark 2.4.7, Scala 2.11)
For PySpark (Python)
Install td-pyspark from PyPI with pip
:
$ pip install td-pyspark
If you want to install PySpark as well, specify [spark]
option:
$ pip install td-pyspark[spark]
Release Notes
v24.4.1
Upgraded to Spark 3.5.1
(breaking) Dropped support for Scala 2.12. Use Scala 2.13 instead.
Support JDK17
Note: This version still works with JDK8, but in the upcoming td-spark versions we are planning to support only JDK17 or later as Spark 4.x will require JDK17 and JDK17 is widely supported in AWS, Databricks, etc.
Downloads
td-spark-assembly-24.4.1_spark3.5.1.jar (Spark 3.5.1, Scala 2.13)
v24.4.0
Fixed JVM hang issue when stopping Spark applications
Upgrade to Airframe 24.3.0
Downloads
td-spark-assembly-24.4.0_spark3.4.2.jar (Spark 3.4.2, Scala 2.12)
v24.2.0
Support Spark 3.4.2
Various internal library version upgrade
Downloads
td-spark-assembly-24.2.0_spark3.4.2.jar (Spark 3.4.2, Scala 2.12)
v22.7.1
Recompiling td-spark for JDK8
Internal library version upgrade
Upgrade Airframe to 22.7.2 (for JDK8 support)
Downloads
td-spark-assembly-22.7.1_spark3.3.0.jar (Spark 3.3.0, Scala 2.12)
v22.7.0
Support setting a timezone at .within(durationString, timezone:ZoneId)
Add TD_SPARK_COLOR JVM property to enable (or disable) colored logging
[scala only] Add an experimental df.writeToTD method to configure data shuffling at table writes
Internal library version upgrade
Upgrade Airframe to 22.7.1
Upgrade fluency-treasuredata to 2.6.5
Downloads
td-spark-assembly-22.7.0_spark3.3.0.jar (Spark 3.3.0, Scala 2.12)
v22.6.1
Upgrade to Spark 3.3.0
Internal library version upgrade
Upgrade to msgpack-java to 0.9.3 to support JDK17
Downloads
td-spark-assembly-22.6.1_spark3.3.0.jar (Spark 3.3.0, Scala 2.12)
v22.6.0
Upgrade to Spark 3.2.1
Add retry when reading column blocks to stabilize partition file read
Internal library version upgrade
Upgrade to Scala 2.12.16
Upgrade to Jackson 2.12.7
Upgrade to msgpack-java to 0.9.2
Upgrade to scala-parser-combinator to 2.1.1
Upgrade to fluency-treasuredata to 2.6.4
Upgrade to Airframe 22.6.4
Downloads
td-spark-assembly-22.6.0_spark3.2.1.jar (Spark 3.2.1, Scala 2.12)
v21.10.0
Upgrade to Spark 3.2.0
Internal library version upgrade:
Upgrade jackson to 2.12.3
Upgrade Airframe to 21.10.0
Upgrade msgpack-java to 0.9.0
Upgrade td-client-java to 0.9.6
Upgrade to presto-jdbc 350
Upgrade to fluency-treasuredata to 2.6.0
Downloads
td-spark-assembly-21.10.0_spark3.2.0.jar (Spark 3.2.0, Scala 2.12)
v21.5.0
Upgrade to Spark 3.1.1 and Hadoop 3.2
Ramp up reading a large number of partitions
(Experimental) Support vectorized reader. To enable it, set
spark.td.enableVectorizedReader
totrue
. This is currently experimental and may change in future versionsInternal library version upgrade:
Upgrade json4s to 3.7.0-M5
Upgrade msgpack-java to 0.8.22
Upgrade fluency to 2.5.1
Upgrade td-client-java to 0.9.5
Upgrade Airframe to 21.3.1
Downloads
td-spark-assembly-21.5.0_spark3.1.1.jar (Spark 3.1.1, Scala 2.12)
v21.3.0
Upgrade to Spark 3.0.2 and Python 3.9
Fixed a bug when including null values in ArrayType
Support for Spark 2.4.x was removed as of 21.3.0
Downloads
td-spark-assembly-21.3.0_spark3.0.2.jar (Spark 3.0.2, Scala 2.12)
v20.12.0
Fixed a bug when creating partitions
Downloads
td-spark-assembly-20.12.0_spark2.4.7.jar (Spark 2.4.7, Scala 2.11)
td-spark-assembly-20.12.0_spark3.0.1.jar (Spark 3.0.1, Scala 2.12)
v20.10.0
Upgrade to Spark 2.4.7, Spark 3.0.1
Fixed a bug that caused upload failure of DataFrame if it contains time column whose type is not Long
Fixed a bug when reading Map type values inside a column
Fixed the partition reader to reflect
spark.sql.maxPartitionBytes
andspark.sql.files.openCostInBytes
configuration parameters. This will reduce the number of necessary Spark tasks by packing multiple partition read tasks into a single task. See also Spark SQL Performance Tuning Guide.Internal library version upgrade:
Upgrade jackson to 2.10.5
Upgrade json4s to 3.6.6
Upgrade fluency to 2.4.1
Upgrade presto-jdbc version to 338 to fix the performance issue using with JDK11
Upgrade Airframe to 20.10.0
Upgrade to Scala 2.11.12, Scala 2.12.12
Upgrade td-client-java to 0.9.3
Downloads
td-spark-assembly-20.10.0_spark2.4.7.jar (Spark 2.4.7, Scala 2.11)
td-spark-assembly-20.10.0_spark3.0.1.jar (Spark 3.0.1, Scala 2.12)
v20.6.2
A bug fix for properly handling HTTP responses when receiving 5xx errors from APIs.
Downloads
td-spark-assembly-20.6.2_spark2.4.6.jar (Spark 2.4.6, Scala 2.11)
td-spark-assembly-20.6.2_spark3.0.0.jar (Spark 3.0.0, Scala 2.12)
v20.6.1
This release supports Spark 2.4.6 and Spark 3.0.0 (official release).
Downloads
td-spark-assembly-20.6.1_spark2.4.6.jar (Spark 2.4.6, Scala 2.11)
td-spark-assembly-20.6.1_spark3.0.0.jar (Spark 3.0.0, Scala 2.12)
v20.6.0
Downloads
td-spark-assembly-20.6.0_spark2.4.5.jar (Spark 2.4.5, Scala 2.11)
td-spark-assembly-20.6.0_spark3.0.0-preview2.jar (Spark 3.0.0-preview2, Scala 2.12)
Major Changes
Support swapping table contents
Bug Fixes
Bump to msgpack-java 0.8.20 with JDK8 compatibility
Fixed NPE in reading specific Array column values
Handle 504 responses properly
Internal Changes
Upgrade to Airframe 20.5.2
v20.4.0
Downloads
td-spark-assembly-20.4.0_spark2.4.5.jar (Spark 2.4.5, Scala 2.11)
td-spark-assembly-20.4.0_spark3.0.0-preview2.jar (Spark 3.0.0-preview2, Scala 2.12)
Changes
Spark 2.4.5 support
Support ap02 for spark.td.site configuration
v20.2.0
Downloads
td-spark-assembly-20.2.0_spark2.4.4.jar (Spark 2.4.4, Scala 2.11)
td-spark-assembly-20.2.0_spark3.0.0-preview2.jar (Spark 3.0.0-preview2, Scala 2.12)
Changes
Spark 3.0.0-preview2 support
v19.11.1
Downloads
td-spark-assembly-19.11.1_spark2.4.4.jar (Spark 2.4.4, Scala 2.11)
Bug Fixes
Fixed a bug in uploading DataFrame whose time column contains null or non unixtime values.
Fixed an error when installing td_pyspark using Python 2
v19.11.0
Downloads
td-spark-assembly-19.11.0_spark2.4.4.jar (Spark 2.4.4, Scala 2.11)
td-spark-assembly-19.11.0_spark3.0.0-preview.jar (Spark 3.0.0-preview, Scala 2.12)
Commands for running spark-shell with Docker:
Spark 2.4.4:
docker run -it -e TD_API_KEY=$TD_API_KEY armtd/td-spark-shell:19.11.0_spark2.4.4
Spark 3.0.0-preview:
docker run -it -e TD_API_KEY=$TD_API_KEY armtd/td-spark-shell:19.11.0_spark3.0.0-preview
PySpark 2.4.4:
docker run -it -e TD_API_KEY=$TD_API_KEY armtd/td-spark-pyspark:19.11.0_spark2.4.4
PySpark 3.0.0.dev0:
docker run -it -e TD_API_KEY=$TD_API_KEY armtd/td-spark-pyspark:19.11.0_spark3.0.0-preview
Major Changes
Support Spark 2.4.4 (Scala 2.11) and Spark 3.0.0-preview (Scala 2.12, pyspark 3.0.0.dev0)
Support using multiple TD accounts with
val td2 = td.withApiKey("...")
(Scala),td2 = td.with_apikey("...")
(Python).
Bug Fixes
Fixed the table preview of array column values inserted from td-spark
Internal Changes
Upgrade to Airframe 19.11.0
v19.7.0
Downloads
td-spark-assembly_2.11-19.7.0.jar (Spark 2.4.3, Scala 2.11)
Major Changes
Fully support PySpark. Install the package from PyPI: https://pypi.org/project/td-pyspark/
Bug fixes
Fixed scala-parser-combinator error when using
td.presto(sql)
.Bump to Fluency 2.3.2 with configuration fix
Add retry around drop table/database
Internal Changes
Upgrade to Airframe 19.8.9