Skip to content

Features

The DocETL playground provides an interactive environment for building and testing document processing pipelines. Here are the key features:

Current Features

Hybrid Interface

The playground offers a unique hybrid between a notebook and spreadsheet interface, allowing you to: - Iteratively develop and test pipeline operations - Inspect operation outputs in a tabular format - Seamlessly switch between code and data views

Performance Optimizations

To ensure responsive interaction: - Smart sampling of large datasets for quick iteration - Automatic caching of operation results - Efficient handling of LLM API calls

Output Management

  • Add notes and highlights to important outputs
  • Save and organize findings during pipeline development
  • Track key insights and results

Export Capabilities

  • Export results from any operation to CSV
  • Preserve intermediate results for further analysis
  • Share outputs with team members

Upcoming Features

We're actively working on several exciting ideas:

Natural Language Pipeline Assistant

  • Generate and indirectly modify pipelines using natural language
  • Interactive help for pipeline development

Enhanced Validation UI

  • Per-document retry capabilities for failed operations
  • UI support for gleaning validation outside of extra kwargs
  • Visual feedback for validation results

Pipeline Optimization Interface

  • Interactive tools for optimizing operation performance
  • Visual pipeline analysis and bottleneck identification
  • Suggestions for pipeline efficiency improvements

Join the Development

Interested in these upcoming features? Join our Discord community to provide feedback and help shape the development of these features!