Edit

Use ai.translate with PySpark

The ai.translate function translates each input row into the target language you choose.

Note

Overview

The ai.translate function is available for Spark DataFrames. You must specify an existing input column name as a parameter, along with a target language.

The function returns a new DataFrame with translations for each input text row, stored in an output column.

Syntax

df.ai.translate(to_lang="spanish", input_col="text", output_col="translations")

Parameters

Name Description
to_lang
Required
A string that represents the target language for text translations.
input_col
Required
A string that contains the name of an existing column with input text values to translate.
output_col
Optional
A string that contains the name of a new column that stores translations for each input text row. If you don't set this parameter, a default name generates for the output column.
error_col
Optional
A string that contains the name of a new column that stores any OpenAI errors that result from processing each input text row. If you don't set this parameter, a default name generates for the error column. If an input row has no errors, the value in this column is null.

Returns

The function returns a Spark DataFrame that includes a new column that contains translations for the text in the input column row. If the input text is null, the result is null.

Example

# This code uses AI. Always review output for mistakes.

df = spark.createDataFrame([
        ("Hello! How are you doing today?",),
        ("Tell me what you'd like to know, and I'll do my best to help.",),
        ("The only thing we have to fear is fear itself.",),
    ], ["text"])

translations = df.ai.translate(to_lang="spanish", input_col="text", output_col="translations")
display(translations)

Output:

Screenshot of a data frame with columns 'text' and 'translations'. The 'translations' column contains the text translated to Spanish.

Multimodal input

To translate images, PDFs, or text files, set input_col_type="path". For setup, see Use multimodal input with AI Functions.

# This code uses AI. Always review output for mistakes.

results = custom_df.ai.translate(
    to_lang="Chinese",
    input_col="file_path",
    input_col_type="path",
    output_col="chinese_version",
)
display(results)