Skip to content

Unnest Operation

The Unnest operation expands an array field or a dictionary in the input data into multiple items, so individual elements can be processed separately.

flowchart LR
    d["doc with items=[a, b]"] --> r1["doc with items=a"]
    d --> r2["doc with items=b"]

How Unnest Works

The Unnest operation behaves differently depending on the type of data being unnested:

  • For list-type unnesting: It replaces the original key with each individual element from the list.
  • For dictionary-type unnesting: It adds new keys to the parent dictionary based on the expand_fields parameter.

Unnest does not have an output schema. It modifies the structure of your data in place.

Configuration

Required Parameters

Parameter Description
type Must be set to "unnest"
name A unique name for the operation
unnest_key The key of the array field to unnest

Optional Parameters

Parameter Description Default
keep_empty If true, empty arrays being exploded will be kept in the output (with value None) false
expand_fields A list of fields to expand from the nested dictionary into the parent dictionary, if unnesting a dict []
recursive If true, the unnest operation will be applied recursively to nested arrays false
depth The maximum depth for recursive unnesting (only applicable if recursive is true) inf
sample Number of samples to use for the operation None

Output

The Unnest operation modifies the structure of your data:

  • For list-type unnesting: It generates multiple output items for each input item, replacing the original array in the unnest_key field with individual elements.
  • For dictionary-type unnesting: It expands the specified fields into the parent dictionary.

All other original key-value pairs from the input item are preserved in the output.

Note

When unnesting dictionaries, the original nested dictionary is preserved in the output, and the specified fields are expanded into the parent dictionary.

Use Cases

  1. Product Analysis in Orders: Unnest a list of products in each order, then use a map operation to analyze each product individually.
  2. Comment Sentiment Analysis: Unnest a list of comments for each post, enabling sentiment analysis on individual comments.
  3. Nested Data Structure Flattening: Unnest complex nested data structures to create a flattened dataset for easier analysis or processing.
  4. Processing Time Series Data: Unnest time series data stored in arrays to analyze individual time points.

Example: Analyzing Product Reviews

- name: extract_salient_quotes
  type: map
  prompt: |
    For the following product review, extract up to 3 salient quotes that best represent the reviewer's opinion:

    {{ input.review_text }}

    For each quote, provide the text and its sentiment (positive, negative, or neutral).
  output:
    schema:
      salient_quotes: list[string]

- name: unnest_quotes
  type: unnest
  unnest_key: salient_quotes

- name: analyze_quote
  type: map
  prompt: |
    Analyze the following quote from a product review:

    Quote & information: {{ input.salient_quotes }}
    Review text: {{ input.review_text }}

    Provide a detailed analysis of the quote, including:
    1. The specific aspect of the product being discussed
    2. The strength of the sentiment (-5 to 5, where -5 is extremely negative and 5 is extremely positive)
    3. Any key terms or phrases that stand out

  output:
    schema:
      product_aspect: string
      sentiment_strength: number
      key_terms: list[string]
import docetl

docetl.default_model = "gpt-4o-mini"

frame = docetl.read_json("reviews.json")
frame = frame.map(
    prompt="""For the following product review, extract up to 3 salient quotes that best represent the reviewer's opinion:

{{ input.review_text }}

For each quote, provide the text and its sentiment (positive, negative, or neutral).""",
    output={"schema": {"salient_quotes": "list[string]"}},
)
frame = frame.unnest(unnest_key="salient_quotes")
frame = frame.map(
    prompt="""Analyze the following quote from a product review:

Quote & information: {{ input.salient_quotes }}
Review text: {{ input.review_text }}

Provide a detailed analysis of the quote, including:
1. The specific aspect of the product being discussed
2. The strength of the sentiment (-5 to 5, where -5 is extremely negative and 5 is extremely positive)
3. Any key terms or phrases that stand out""",
    output={
        "schema": {
            "product_aspect": "string",
            "sentiment_strength": "number",
            "key_terms": "list[string]",
        }
    },
)
rows = frame.collect()

After unnesting, each quote becomes its own item, accessible via input.salient_quotes in the second map.

Advanced Features

Recursive Unnesting

When dealing with deeply nested structures, you can use the recursive parameter to apply the unnest operation at multiple levels:

- name: recursive_unnest
  type: unnest
  unnest_key: nested_data
  recursive: true
  depth: 3 # Limit recursion to 3 levels deep
frame = frame.unnest(
    name="recursive_unnest",
    unnest_key="nested_data",
    recursive=True,
    depth=3,  # Limit recursion to 3 levels deep
)

Dictionary Expansion

When unnesting dictionaries, you can use the expand_fields parameter to flatten specific fields into the parent structure:

- name: expand_user_data
  type: unnest
  unnest_key: user_info
  expand_fields:
    - name
    - age
    - location
frame = frame.unnest(
    name="expand_user_data",
    unnest_key="user_info",
    expand_fields=["name", "age", "location"],
)

In this case, name, age, and location would be added as new keys in the parent dictionary, alongside the original user_info key.

Best Practices

  1. Consider Data Volume: Unnesting multiplies the number of items in your data stream; design subsequent operations accordingly.

  2. Use Expand Fields Wisely: When unnesting dictionaries with expand_fields, watch for key conflicts with the parent dictionary.

  3. Handle Empty Arrays: Decide whether empty arrays should be kept (keep_empty) based on how subsequent operations handle null values.