Features
The DocETL playground provides an interactive environment for building and testing document processing pipelines. Here are the key features:
Current Features
Hybrid Interface
The playground offers a unique hybrid between a notebook and spreadsheet interface, allowing you to: - Iteratively develop and test pipeline operations - Inspect operation outputs in a tabular format - Seamlessly switch between code and data views
Performance Optimizations
To ensure responsive interaction: - Smart sampling of large datasets for quick iteration - Automatic caching of operation results - Efficient handling of LLM API calls
Output Management
- Add notes and highlights to important outputs
- Save and organize findings during pipeline development
- Track key insights and results
Export Capabilities
- Export results from any operation to CSV
- Preserve intermediate results for further analysis
- Share outputs with team members
Upcoming Features
We're actively working on several exciting ideas:
Natural Language Pipeline Assistant
- Generate and indirectly modify pipelines using natural language
- Interactive help for pipeline development
Enhanced Validation UI
- Per-document retry capabilities for failed operations
- UI support for gleaning validation outside of extra kwargs
- Visual feedback for validation results
Pipeline Optimization Interface
- Interactive tools for optimizing operation performance
- Visual pipeline analysis and bottleneck identification
- Suggestions for pipeline efficiency improvements
Join the Development
Interested in these upcoming features? Join our Discord community to provide feedback and help shape the development of these features!