td-spark API Documentation
==========================

td-spark is a library for reading and writing tables in `Treasure Data`_ through the DataFrame API of `Apache Spark`_. For Python users, `td-pyspark`_ PyPI package for `PySpark`_ is available.

.. _`Treasure Data`: https://treasuredata.com
.. _`Apache Spark`: https://spark.apache.org
.. _`PySpark`: http://spark.apache.org/docs/latest/api/python/index.html
.. _`td-pyspark`: https://pypi.org/project/td-pyspark/


.. image:: ./img/td-overview.png

Features
--------

**Important**: Accessing Treasure Data using td-spark is disabled by default. First contact `support@treasure-data.com <mailto:support@treasure-data.com>`_ to enable this feature.

td-spark supports:

- Reading and writing tables in TD through DataFrames of Spark.
- Running Spark SQL queries against DataFrames.
- Submitting Presto/Hive SQL queries to TD and reading the query results as DataFrames.
- If you are using PySpark, you can use both Spark's DataFrames and Pandas DataFrames interchangeably.
- Using any hosted Spark services, such as Amazon EMR, Databrics Cloud.
  - It's also possible to use Google Colaboratory to run PySpark.


.. toctree::
    :maxdepth: 2
    :caption: Download & Release Notes

    release_notes

* `Previous Release Notes <https://docs.treasuredata.com/display/public/PD/Apache+Spark+Driver+%28td-spark%29+Release+Notes#ApacheSparkDriver(tdspark)ReleaseNotes-ReleaseNotes>`_

.. toctree::
   :maxdepth: 2
   :caption: td-spark (Scala)

   td-spark

.. toctree::
   :maxdepth: 2
   :caption: td-pyspark (Python)

   getting_started_py
   td_pyspark