Transforms
Data transformation tools available on OmniAI
Last updated
Data transformation tools available on OmniAI
Last updated
Each column you define uses a specific tool for the data transformation. The following provides a list of pre-built tools available on the platform. Custom tools can also be provisioned.
Extract a specific piece of content from an unstructured data field. This tool targets a piece of content (document, text, etc.) and extracts a specific value. You can specify a return type, as well as additional parameters for fine tuning the result.
Types: STRING
, NUMBER
, BOOLEAN
Parameters:
MAX_LENGTH
- Maximum length of the returned string
MIN_LENGTH
- Minimum length of the returned string
MAX
- Maximum length of the returned number
MIN
- Minimum length of the returned number
DEFAULT
- Default value if no result is found (if unspecified returns <null>
)
Match a piece of content with it's most similar category. Categories can be defined in a few manners:
User defined:
The user provides a list of category options.
Ex: Classifying receipts by department, a user might define FINANCE
, MARKETING
, HUMAN_RESOURCES
and the content will be matched with the most similar category.
Dynamic:
The user defines a query to get the available categories. This is a common method when there are a large number of available options, or if the options change frequently.
Ex: SELECT DISTINCT type FROM departments;
Auto Detect BETA
:
Let OmniAI scan the column contents and determine a set of categories. This is a powerful tool when you are looking at a column with an unknown mix of data (ex: document urls).
Extract keywords from arbitrary bodies of text. This is helpful when you want to enable better search across a large volume of documents / text. Keywords are returned as an array of strings.
Parameters
TEXT_TRANSFORM
- How to format the resulting keywords. Defaults to inherit casing from original text. Option: UPPERCASE
, LOWERCASE
, INHERIT
MAX_KEYWORDS
- Maximum number of keywords to return
Generates a summary of the target column. You can control the style and format of the summary.
Parameters
MAX_LENGTH
- Maximum length of the returned string.
MIN_LENGTH
- Minimum length of the returned string.
PARAGRAPH
(default) - A single paragraph summarizing the content.
BULLETS
- A list of bullet points summarizing the content.
HEADLINE
- A single sentence summary of the content.
DEFAULT
- Default value if no result is found (if unspecified returns <null>
)/
Run a sentiment analysis on a particular piece of content. This can analyze the entire body of content, or within reference to a specific concept in the document.
For content specific sentiment, the tool will only run classification on the specific portion of the document matching your content (i.e. sentiment of text mentioning "tesla stock")
Types:
Enum: POSITIVE
, NEGATIVE
, NEUTRAL
, UNKNOWN
Scale: -1
to 1
with 1
being the most positive.
Parameters:
TYPE
- Enum or Scale. Defaults to enum.
CONCEPT
- Target content for sentiment analysis. Ex: "tesla stock"
DEFAULT
- Default value if no result is found (if unspecified returns <null>
)
Summarize audio data over time into chapters. Chapters makes it easy for users to navigate and find specific information. Chaptering by timestamp is only available if the transcripts contain timestamp values, otherwise chaptering will return a character count.
Each chapter contains the following:
One-line summary
Start and end timestamps (or character count)
The content moderation model lets you detect inappropriate content in text/documents to ensure that your content is safe for all audiences. For a given piece of content, OmniAI will return a label along with a confidence score.
Labels:
hate
, harassment
, threatening
, self-harm
, sexual
, sexual/minors
, violence
, insult
, identity_hate
Parameters:
CONFIDENCE SCORE
- Threshold for flagging hazardous content.
Translate columns between a variety of different languages. OmniAI supports direct translation to 40 languages. For more information about language support, see Language Support
Provide a custom LLM prompt to extract or generate values in a specific format. This tool is valuable when you want to transform your data to a very specific use case. Ex: generating custom email copy based on information from your CRM notes.
PROMPT
: A string describing what transformation you would like to perform. You can reference columns within your table via template literals.
TYPE
: One of the following STRING
, NUMBER
, BOOLEAN
, ENUM
Prompting example:
The PII Redaction tool lets you minimize sensitive information about individuals by automatically identifying and removing it from your content. There are two options for PII redaction, HASH
substitutes, and ENTITY
substitutes.
With hash
substitution: Hi, my name is ####!
With entity
substitution: Hi, my name is [PERSON_NAME]!