Logo
Sign in

Apache Toree is a Jupyter kernel that enables interactive access to Apache Spark. It supports multiple languages including Scala and Python, allowing users to run Spark code in notebooks for data analysis, visualization, and distributed computing workflows.

Vendor

Vendor

The Apache Software Foundation

Company Website

Company Website

toree-quick-start-spark.gif
toree_with_notebook.png
Product details

Apache Toree

Apache Toree is an open-source kernel for the Jupyter Notebook platform that provides interactive access to Apache Spark. It enables users to write and execute Spark code in Scala, Python, and R directly within Jupyter notebooks. Toree is built on the IPython messaging protocol and 0MQ, offering a flexible and extensible environment for data exploration, visualization, and distributed computing.

Features

  • Interactive Spark access via Jupyter Notebooks
  • Multi-language support: Scala, Python, and R
  • Built on IPython messaging protocol and 0MQ
  • Support for Spark magics to extend notebook functionality
  • Integration with visualization tools like Brunel
  • Automatic SparkContext creation for ease of use
  • Remote and local client-server communication models
  • Multi-client support for shared Spark contexts

Capabilities

  • Executes Spark code snippets and returns results in real time
  • Allows dynamic resource management and context manipulation
  • Supports remote execution and multi-tenancy setups
  • Enables rich visualizations of Spark DataFrames
  • Provides extensibility through custom magics and plugins
  • Facilitates interactive data science and machine learning workflows

Benefits

  • Simplifies Spark development and experimentation
  • Enhances productivity with real-time feedback in notebooks
  • Promotes reproducibility and collaboration in data projects
  • Reduces setup complexity with automatic context binding
  • Encourages modular and extensible notebook environments
  • Bridges the gap between data engineering and data science