Unnest Operation
The Unnest operation in DocETL is designed to expand an array field or a dictionary in the input data into multiple items. This operation is particularly useful when you need to process or analyze individual elements of an array or specific fields of a nested dictionary separately.
How Unnest Works
The Unnest operation behaves differently depending on the type of data being unnested:
- For list-type unnesting: It replaces the original key with each individual element from the list.
- For dictionary-type unnesting: It adds new keys to the parent dictionary based on the
expand_fields
parameter.
Unnest does not have an output schema. It modifies the structure of your data in place.
Motivation
The Unnest operation is valuable in scenarios where you need to:
- Process individual items from a list of products in an order
- Analyze separate entries in a list of comments or reviews
- Expand nested data structures for more granular processing
- Flatten complex data structures for easier analysis
Configuration
Required Parameters
Parameter | Description |
---|---|
type | Must be set to "unnest" |
name | A unique name for the operation |
unnest_key | The key of the array field to unnest |
Optional Parameters
Parameter | Description | Default |
---|---|---|
keep_empty | If true, empty arrays being exploded will be kept in the output (with value None) | false |
expand_fields | A list of fields to expand from the nested dictionary into the parent dictionary, if unnesting a dict | [] |
recursive | If true, the unnest operation will be applied recursively to nested arrays | false |
depth | The maximum depth for recursive unnesting (only applicable if recursive is true) | inf |
sample | Number of samples to use for the operation | None |
Output
The Unnest operation modifies the structure of your data:
- For list-type unnesting: It generates multiple output items for each input item, replacing the original array in the
unnest_key
field with individual elements. - For dictionary-type unnesting: It expands the specified fields into the parent dictionary.
All other original key-value pairs from the input item are preserved in the output.
Note
When unnesting dictionaries, the original nested dictionary is preserved in the output, and the specified fields are expanded into the parent dictionary.
Use Cases
- Product Analysis in Orders: Unnest a list of products in each order, then use a map operation to analyze each product individually.
- Comment Sentiment Analysis: Unnest a list of comments for each post, enabling sentiment analysis on individual comments.
- Nested Data Structure Flattening: Unnest complex nested data structures to create a flattened dataset for easier analysis or processing.
- Processing Time Series Data: Unnest time series data stored in arrays to analyze individual time points.
Example: Analyzing Product Reviews
Let's walk through an example of using the Unnest operation to prepare product reviews for detailed analysis.
- name: extract_salient_quotes
type: map
prompt: |
For the following product review, extract up to 3 salient quotes that best represent the reviewer's opinion:
{{ input.review_text }}
For each quote, provide the text and its sentiment (positive, negative, or neutral).
output:
schema:
salient_quotes: list[string]
- name: unnest_quotes
type: unnest
unnest_key: salient_quotes
- name: analyze_quote
type: map
prompt: |
Analyze the following quote from a product review:
Quote & information: {{ input.salient_quotes }}
Review text: {{ input.review_text }}
Provide a detailed analysis of the quote, including:
1. The specific aspect of the product being discussed
2. The strength of the sentiment (-5 to 5, where -5 is extremely negative and 5 is extremely positive)
3. Any key terms or phrases that stand out
output:
schema:
product_aspect: string
sentiment_strength: number
key_terms: list[string]
This example demonstrates how the Unnest operation fits into a pipeline for analyzing product reviews:
- The first Map operation extracts salient quotes from each review.
- The Unnest operation expands the 'salient_quotes' array, creating individual items for each quote. Each quote can now be accessed via
input.salient_quotes
. - The second Map operation performs a detailed analysis on each individual quote.
By unnesting the quotes, we enable more granular analysis that wouldn't be possible if we processed the entire review as a single unit.
Advanced Features
Recursive Unnesting
When dealing with deeply nested structures, you can use the recursive
parameter to apply the unnest operation at multiple levels:
- name: recursive_unnest
type: unnest
unnest_key: nested_data
recursive: true
depth: 3 # Limit recursion to 3 levels deep
Dictionary Expansion
When unnesting dictionaries, you can use the expand_fields
parameter to flatten specific fields into the parent structure:
- name: expand_user_data
type: unnest
unnest_key: user_info
expand_fields:
- name
- age
- location
In this case, name
, age
, and location
would be added as new keys in the parent dictionary, alongside the original user_info
key.
Best Practices
-
Choose the Right Unnest Key: Ensure you're unnesting the correct field that contains the array or nested structure you want to expand.
-
Consider Data Volume: Unnesting can significantly increase the number of items in your data stream. Be mindful of this when designing subsequent operations in your pipeline.
-
Use Expand Fields Wisely: When unnesting dictionaries, use the
expand_fields
parameter to flatten your data structure if needed, but be cautious of potential key conflicts. -
Handle Empty Arrays: Decide whether empty arrays should be kept (using
keep_empty
) based on your specific use case and how subsequent operations should handle null values. -
Preserve Context: When unnesting, consider whether you need to carry forward any context from the parent item. The unnest operation preserves all other fields, which helps maintain context.