Unnest Operation
The Unnest operation expands an array field or a dictionary in the input data into multiple items, so individual elements can be processed separately.
flowchart LR
d["doc with items=[a, b]"] --> r1["doc with items=a"]
d --> r2["doc with items=b"]
How Unnest Works
The Unnest operation behaves differently depending on the type of data being unnested:
- For list-type unnesting: It replaces the original key with each individual element from the list.
- For dictionary-type unnesting: It adds new keys to the parent dictionary based on the
expand_fieldsparameter.
Unnest does not have an output schema. It modifies the structure of your data in place.
Configuration
Required Parameters
| Parameter | Description |
|---|---|
| type | Must be set to "unnest" |
| name | A unique name for the operation |
| unnest_key | The key of the array field to unnest |
Optional Parameters
| Parameter | Description | Default |
|---|---|---|
| keep_empty | If true, empty arrays being exploded will be kept in the output (with value None) | false |
| expand_fields | A list of fields to expand from the nested dictionary into the parent dictionary, if unnesting a dict | [] |
| recursive | If true, the unnest operation will be applied recursively to nested arrays | false |
| depth | The maximum depth for recursive unnesting (only applicable if recursive is true) | inf |
| sample | Number of samples to use for the operation | None |
Output
The Unnest operation modifies the structure of your data:
- For list-type unnesting: It generates multiple output items for each input item, replacing the original array in the
unnest_keyfield with individual elements. - For dictionary-type unnesting: It expands the specified fields into the parent dictionary.
All other original key-value pairs from the input item are preserved in the output.
Note
When unnesting dictionaries, the original nested dictionary is preserved in the output, and the specified fields are expanded into the parent dictionary.
Use Cases
- Product Analysis in Orders: Unnest a list of products in each order, then use a map operation to analyze each product individually.
- Comment Sentiment Analysis: Unnest a list of comments for each post, enabling sentiment analysis on individual comments.
- Nested Data Structure Flattening: Unnest complex nested data structures to create a flattened dataset for easier analysis or processing.
- Processing Time Series Data: Unnest time series data stored in arrays to analyze individual time points.
Example: Analyzing Product Reviews
- name: extract_salient_quotes
type: map
prompt: |
For the following product review, extract up to 3 salient quotes that best represent the reviewer's opinion:
{{ input.review_text }}
For each quote, provide the text and its sentiment (positive, negative, or neutral).
output:
schema:
salient_quotes: list[string]
- name: unnest_quotes
type: unnest
unnest_key: salient_quotes
- name: analyze_quote
type: map
prompt: |
Analyze the following quote from a product review:
Quote & information: {{ input.salient_quotes }}
Review text: {{ input.review_text }}
Provide a detailed analysis of the quote, including:
1. The specific aspect of the product being discussed
2. The strength of the sentiment (-5 to 5, where -5 is extremely negative and 5 is extremely positive)
3. Any key terms or phrases that stand out
output:
schema:
product_aspect: string
sentiment_strength: number
key_terms: list[string]
import docetl
docetl.default_model = "gpt-4o-mini"
frame = docetl.read_json("reviews.json")
frame = frame.map(
prompt="""For the following product review, extract up to 3 salient quotes that best represent the reviewer's opinion:
{{ input.review_text }}
For each quote, provide the text and its sentiment (positive, negative, or neutral).""",
output={"schema": {"salient_quotes": "list[string]"}},
)
frame = frame.unnest(unnest_key="salient_quotes")
frame = frame.map(
prompt="""Analyze the following quote from a product review:
Quote & information: {{ input.salient_quotes }}
Review text: {{ input.review_text }}
Provide a detailed analysis of the quote, including:
1. The specific aspect of the product being discussed
2. The strength of the sentiment (-5 to 5, where -5 is extremely negative and 5 is extremely positive)
3. Any key terms or phrases that stand out""",
output={
"schema": {
"product_aspect": "string",
"sentiment_strength": "number",
"key_terms": "list[string]",
}
},
)
rows = frame.collect()
After unnesting, each quote becomes its own item, accessible via input.salient_quotes in the second map.
Advanced Features
Recursive Unnesting
When dealing with deeply nested structures, you can use the recursive parameter to apply the unnest operation at multiple levels:
- name: recursive_unnest
type: unnest
unnest_key: nested_data
recursive: true
depth: 3 # Limit recursion to 3 levels deep
frame = frame.unnest(
name="recursive_unnest",
unnest_key="nested_data",
recursive=True,
depth=3, # Limit recursion to 3 levels deep
)
Dictionary Expansion
When unnesting dictionaries, you can use the expand_fields parameter to flatten specific fields into the parent structure:
- name: expand_user_data
type: unnest
unnest_key: user_info
expand_fields:
- name
- age
- location
frame = frame.unnest(
name="expand_user_data",
unnest_key="user_info",
expand_fields=["name", "age", "location"],
)
In this case, name, age, and location would be added as new keys in the parent dictionary, alongside the original user_info key.
Best Practices
-
Consider Data Volume: Unnesting multiplies the number of items in your data stream; design subsequent operations accordingly.
-
Use Expand Fields Wisely: When unnesting dictionaries with
expand_fields, watch for key conflicts with the parent dictionary. -
Handle Empty Arrays: Decide whether empty arrays should be kept (
keep_empty) based on how subsequent operations handle null values.