# Download td-spark uses `YY.MM.patch` versioning scheme to show the release year, month, and the patch update number. Note: `td-spark-assembly-latest_xxxx.jar` is an alias to the last release version `td-spark-assembly-YY.MM.patch_(spark version)`. > WARNING: Spark 2.4.x + Scala 2.11 support has been deprecated since December 2020. Consider migrating to Spark 3.x + Scala 2.12. ## For Spark (Scala) td-spark is a library that can be used with your own Spark cluster. Download one of the jar files below and specify the file path as an argument of spark-submit command `--jars (path to td-spark-assembly-xxx.jar)`: - [td-spark-assembly-latest_spark3.3.0.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-latest_spark3.3.0.jar) (Spark 3.3.0, Scala 2.12) - [td-spark-assembly-latest_spark3.2.1.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-latest_spark3.2.1.jar) (Spark 3.2.1, Scala 2.12) - [td-spark-assembly-latest_spark2.4.7.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-latest_spark2.4.7.jar) (Spark 2.4.7, Scala 2.11) ## For PySpark (Python) Install td-pyspark from PyPI with `pip`: ``` $ pip install td-pyspark ``` If you want to install PySpark as well, specify `[spark]` option: ``` $ pip install td-pyspark[spark] ``` ## Docker Images Docker images of td-spark packaged with Spark is available on DockerHub [devtd](https://hub.docker.com/orgs/devtd/repositories). Here are example commands for running spark-shell with td-spark using Docker: Spark 3.3.0: ```sh $ docker pull devtd/td-spark-shell:latest_spark3.3.0 $ docker run -it -e TD_API_KEY=$TD_API_KEY devtd/td-spark-shell:latest_spark3.3.0 ``` PySpark 3.3.0: ```sh $ docker pull devtd/td-spark-pyspark:latest_spark3.3.0 $ docker run -it -e TD_API_KEY=$TD_API_KEY devtd/td-spark-pyspark:latest_spark3.3.0 ``` # Release Notes ## v22.7.1 - Recompiling td-spark for JDK8 - Internal library version upgrade - Upgrade Airframe to 22.7.2 (for JDK8 support) ### Downloads - [td-spark-assembly-22.7.1_spark3.3.0.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-22.7.1_spark3.3.0.jar) (Spark 3.3.0, Scala 2.12) ## v22.7.0 - Support setting a timezone at .within(durationString, timezone:ZoneId) - Add TD_SPARK_COLOR JVM property to enable (or disable) colored logging - [scala only] Add an experimental df.writeToTD method to configure data shuffling at table writes - Internal library version upgrade - Upgrade Airframe to 22.7.1 - Upgrade fluency-treasuredata to 2.6.5 ### Downloads - [td-spark-assembly-22.7.0_spark3.3.0.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-22.7.0_spark3.3.0.jar) (Spark 3.3.0, Scala 2.12) ## v22.6.1 - Upgrade to [Spark 3.3.0](https://spark.apache.org/releases/spark-release-3-3-0.html) - Internal library version upgrade - Upgrade to msgpack-java to 0.9.3 to support JDK17 ### Downloads - [td-spark-assembly-22.6.1_spark3.3.0.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-22.6.1_spark3.3.0.jar) (Spark 3.3.0, Scala 2.12) ## v22.6.0 - Upgrade to [Spark 3.2.1](https://spark.apache.org/releases/spark-release-3-2-1.html) - Add retry when reading column blocks to stabilize partition file read - Internal library version upgrade - Upgrade to Scala 2.12.16 - Upgrade to Jackson 2.12.7 - Upgrade to msgpack-java to 0.9.2 - Upgrade to scala-parser-combinator to 2.1.1 - Upgrade to fluency-treasuredata to 2.6.4 - Upgrade to Airframe 22.6.4 ### Downloads - [td-spark-assembly-22.6.0_spark3.2.1.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-22.6.0_spark3.2.1.jar) (Spark 3.2.1, Scala 2.12) ## v21.10.0 - Upgrade to [Spark 3.2.0](https://spark.apache.org/releases/spark-release-3-2-0.html) - Internal library version upgrade: - Upgrade jackson to 2.12.3 - Upgrade Airframe to 21.10.0 - Upgrade msgpack-java to 0.9.0 - Upgrade td-client-java to 0.9.6 - Upgrade to presto-jdbc 350 - Upgrade to fluency-treasuredata to 2.6.0 ### Downloads - [td-spark-assembly-21.10.0_spark3.2.0.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-21.10.0_spark3.2.0.jar) (Spark 3.2.0, Scala 2.12) ## v21.5.0 - Upgrade to Spark 3.1.1 and Hadoop 3.2 - Ramp up reading a large number of partitions - (Experimental) Support vectorized reader. To enable it, set `spark.td.enableVectorizedReader` to `true`. This is currently experimental and may change in future versions - Internal library version upgrade: - Upgrade json4s to 3.7.0-M5 - Upgrade msgpack-java to 0.8.22 - Upgrade fluency to 2.5.1 - Upgrade td-client-java to 0.9.5 - Upgrade Airframe to 21.3.1 ### Downloads - [td-spark-assembly-21.5.0_spark3.1.1.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-21.5.0_spark3.1.1.jar) (Spark 3.1.1, Scala 2.12) ## v21.3.0 - Upgrade to Spark 3.0.2 and Python 3.9 - Fixed a bug when including null values in ArrayType - Support for Spark 2.4.x was removed as of 21.3.0 ### Downloads - [td-spark-assembly-21.3.0_spark3.0.2.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-21.3.0_spark3.0.2.jar) (Spark 3.0.2, Scala 2.12) ## v20.12.0 - Fixed a bug when creating partitions ### Downloads - [td-spark-assembly-20.12.0_spark2.4.7.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.12.0_spark2.4.7.jar) (Spark 2.4.7, Scala 2.11) - [td-spark-assembly-20.12.0_spark3.0.1.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.12.0_spark3.0.1.jar) (Spark 3.0.1, Scala 2.12) ## v20.10.0 - Upgrade to Spark 2.4.7, Spark 3.0.1 - Fixed a bug that caused upload failure of DataFrame if it contains time column whose type is not Long - Fixed a bug when reading Map type values inside a column - Fixed the partition reader to reflect `spark.sql.maxPartitionBytes` and `spark.sql.files.openCostInBytes` configuration parameters. This will reduce the number of necessary Spark tasks by packing multiple partition read tasks into a single task. See also [Spark SQL Performance Tuning Guide](https://spark.apache.org/docs/latest/sql-performance-tuning.html#other-configuration-options). - Internal library version upgrade: - Upgrade jackson to 2.10.5 - Upgrade json4s to 3.6.6 - Upgrade fluency to 2.4.1 - Upgrade presto-jdbc version to 338 to fix the performance issue using with JDK11 - Upgrade Airframe to 20.10.0 - Upgrade to Scala 2.11.12, Scala 2.12.12 - Upgrade td-client-java to 0.9.3 ### Downloads - [td-spark-assembly-20.10.0_spark2.4.7.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.10.0_spark2.4.7.jar) (Spark 2.4.7, Scala 2.11) - [td-spark-assembly-20.10.0_spark3.0.1.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.10.0_spark3.0.1.jar) (Spark 3.0.1, Scala 2.12) ## v20.6.2 - A bug fix for properly handling HTTP responses when receiving 5xx errors from APIs. ### Downloads - [td-spark-assembly-20.6.2_spark2.4.6.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.6.2_spark2.4.6.jar) (Spark 2.4.6, Scala 2.11) - [td-spark-assembly-20.6.2_spark3.0.0.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.6.2_spark3.0.0.jar) (Spark 3.0.0, Scala 2.12) ## v20.6.1 This release supports Spark 2.4.6 and Spark 3.0.0 (official release). ### Downloads - [td-spark-assembly-20.6.1_spark2.4.6.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.6.1_spark2.4.6.jar) (Spark 2.4.6, Scala 2.11) - [td-spark-assembly-20.6.1_spark3.0.0.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.6.1_spark3.0.0.jar) (Spark 3.0.0, Scala 2.12) ## v20.6.0 ### Downloads - [td-spark-assembly-20.6.0_spark2.4.5.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.6.0_spark2.4.5.jar) (Spark 2.4.5, Scala 2.11) - [td-spark-assembly-20.6.0_spark3.0.0-preview2.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.6.0_spark3.0.0-preview2.jar) (Spark 3.0.0-preview2, Scala 2.12) ### Major Changes - Support swapping table contents ### Bug Fixes - Bump to msgpack-java 0.8.20 with JDK8 compatibility - Fixed NPE in reading specific Array column values - Handle 504 responses properly ### Internal Changes - Upgrade to [Airframe 20.5.2](https://wvlet.org/airframe/docs/release-notes.html#2052) ## v20.4.0 ### Downloads - [td-spark-assembly-20.4.0_spark2.4.5.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.4.0_spark2.4.5.jar) (Spark 2.4.5, Scala 2.11) - [td-spark-assembly-20.4.0_spark3.0.0-preview2.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.4.0_spark3.0.0-preview2.jar) (Spark 3.0.0-preview2, Scala 2.12) ### Changes - Spark 2.4.5 support - Support ap02 for spark.td.site configuration ## v20.2.0 ### Downloads - [td-spark-assembly-20.2.0_spark2.4.4.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.2.0_spark2.4.4.jar) (Spark 2.4.4, Scala 2.11) - [td-spark-assembly-20.2.0_spark3.0.0-preview2.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-20.2.0_spark3.0.0-preview2.jar) (Spark 3.0.0-preview2, Scala 2.12) ### Changes - Spark 3.0.0-preview2 support ## v19.11.1 ### Downloads - [td-spark-assembly-19.11.1_spark2.4.4.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-19.11.1_spark2.4.4.jar) (Spark 2.4.4, Scala 2.11) ### Bug Fixes - Fixed a bug in uploading DataFrame whose time column contains null or non unixtime values. - Fixed an error when installing td_pyspark using Python 2 ## v19.11.0 ### Downloads - [td-spark-assembly-19.11.0_spark2.4.4.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-19.11.0_spark2.4.4.jar) (Spark 2.4.4, Scala 2.11) - [td-spark-assembly-19.11.0_spark3.0.0-preview.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly-19.11.0_spark3.0.0-preview.jar) (Spark 3.0.0-preview, Scala 2.12) - Commands for running spark-shell with Docker: - Spark 2.4.4: `docker run -it -e TD_API_KEY=$TD_API_KEY armtd/td-spark-shell:19.11.0_spark2.4.4` - Spark 3.0.0-preview: `docker run -it -e TD_API_KEY=$TD_API_KEY armtd/td-spark-shell:19.11.0_spark3.0.0-preview` - PySpark 2.4.4: `docker run -it -e TD_API_KEY=$TD_API_KEY armtd/td-spark-pyspark:19.11.0_spark2.4.4` - PySpark 3.0.0.dev0: `docker run -it -e TD_API_KEY=$TD_API_KEY armtd/td-spark-pyspark:19.11.0_spark3.0.0-preview` ### Major Changes - Support Spark 2.4.4 (Scala 2.11) and Spark 3.0.0-preview (Scala 2.12, pyspark 3.0.0.dev0) - Support using multiple TD accounts with `val td2 = td.withApiKey("...")` (Scala), `td2 = td.with_apikey("...")` (Python). ### Bug Fixes - Fixed the table preview of array column values inserted from td-spark ### Internal Changes - Upgrade to [Airframe 19.11.0](https://wvlet.org/airframe/docs/release-notes.html#19110) ## v19.7.0 ### Downloads - [td-spark-assembly_2.11-19.7.0.jar](https://td-spark.s3.amazonaws.com/td-spark-assembly_2.11-19.7.0.jar) (Spark 2.4.3, Scala 2.11) ### Major Changes - Fully support PySpark. Install the package from PyPI: https://pypi.org/project/td-pyspark/ ### Bug fixes - Fixed scala-parser-combinator error when using `td.presto(sql)`. - Bump to Fluency 2.3.2 with configuration fix - Add retry around drop table/database ### Internal Changes Upgrade to [Airframe 19.8.9](https://wvlet.org/airframe/docs/release-notes.html#1989)