td-spark API Documentation ========================== td-spark is a library for reading and writing tables in `Treasure Data`_ through the DataFrame API of `Apache Spark`_. For Python users, `td-pyspark`_ PyPI package for `PySpark`_ is available. .. _`Treasure Data`: https://treasuredata.com .. _`Apache Spark`: https://spark.apache.org .. _`PySpark`: http://spark.apache.org/docs/latest/api/python/index.html .. _`td-pyspark`: https://pypi.org/project/td-pyspark/ .. image:: ./img/td-overview.png Features -------- **Important**: Accessing Treasure Data using td-spark is disabled by default. First contact `support@treasure-data.com `_ to enable this feature. td-spark supports: - Reading and writing tables in TD through DataFrames of Spark. - Running Spark SQL queries against DataFrames. - Submitting Presto/Hive SQL queries to TD and reading the query results as DataFrames. - If you are using PySpark, you can use both Spark's DataFrames and Pandas DataFrames interchangeably. - Using any hosted Spark services, such as Amazon EMR, Databrics Cloud. - It's also possible to use Google Colaboratory to run PySpark. .. toctree:: :maxdepth: 2 :caption: Download & Release Notes release_notes * `Previous Release Notes `_ .. toctree:: :maxdepth: 2 :caption: td-spark (Scala) td-spark .. toctree:: :maxdepth: 2 :caption: td-pyspark (Python) getting_started_py td_pyspark