Transforms

Data transformation tools available on OmniAI

Each column you define uses a specific tool for the data transformation. The following provides a list of pre-built tools available on the platform. Custom tools can also be provisioned.

Extraction

Extract a specific piece of content from an unstructured data field. This tool targets a piece of content (document, text, etc.) and extracts a specific value. You can specify a return type, as well as additional parameters for fine tuning the result.

  • Types: STRING, NUMBER, BOOLEAN

  • Parameters:

    • MAX_LENGTH- Maximum length of the returned string

    • MIN_LENGTH- Minimum length of the returned string

    • MAX - Maximum length of the returned number

    • MIN - Minimum length of the returned number

    • DEFAULT - Default value if no result is found (if unspecified returns <null>)

Categorization

Match a piece of content with it's most similar category. Categories can be defined in a few manners:

  • User defined:

    • The user provides a list of category options.

    • Ex: Classifying receipts by department, a user might define FINANCE, MARKETING, HUMAN_RESOURCES and the content will be matched with the most similar category.

  • Dynamic:

    • The user defines a query to get the available categories. This is a common method when there are a large number of available options, or if the options change frequently.

    • Ex: SELECT DISTINCT type FROM departments;

  • Auto Detect BETA:

    • Let OmniAI scan the column contents and determine a set of categories. This is a powerful tool when you are looking at a column with an unknown mix of data (ex: document urls).

Keyword Detection

Extract keywords from arbitrary bodies of text. This is helpful when you want to enable better search across a large volume of documents / text. Keywords are returned as an array of strings.

  • Parameters

    • TEXT_TRANSFORM - How to format the resulting keywords. Defaults to inherit casing from original text. Option: UPPERCASE, LOWERCASE, INHERIT

    • MAX_KEYWORDS - Maximum number of keywords to return

Input: "A man a plan a canal panama."
Output: ["canal", "panama"]

Summarization

Generates a summary of the target column. You can control the style and format of the summary.

  • Parameters

    • MAX_LENGTH - Maximum length of the returned string.

    • MIN_LENGTH - Minimum length of the returned string.

    • PARAGRAPH (default) - A single paragraph summarizing the content.

    • BULLETS - A list of bullet points summarizing the content.

    • HEADLINE - A single sentence summary of the content.

    • DEFAULT- Default value if no result is found (if unspecified returns <null>)/

Sentiment

Run a sentiment analysis on a particular piece of content. This can analyze the entire body of content, or within reference to a specific concept in the document.

For content specific sentiment, the tool will only run classification on the specific portion of the document matching your content (i.e. sentiment of text mentioning "tesla stock")

  • Types:

    • Enum: POSITIVE, NEGATIVE, NEUTRAL, UNKNOWN

    • Scale: -1 to 1 with 1 being the most positive.

  • Parameters:

    • TYPE - Enum or Scale. Defaults to enum.

    • CONCEPT - Target content for sentiment analysis. Ex: "tesla stock"

    • DEFAULT - Default value if no result is found (if unspecified returns <null>)

Content Chaptering

Summarize audio data over time into chapters. Chapters makes it easy for users to navigate and find specific information. Chaptering by timestamp is only available if the transcripts contain timestamp values, otherwise chaptering will return a character count.

Each chapter contains the following:

  • One-line summary

  • Start and end timestamps (or character count)

{ 
  chapters:[
    0:{
        summary:"Microeconomics is the social science that studies the...",
        start: 0,
        end: 64840
    },
    1:{
        summary:"Neoclassical economists make simplifying assumptions about mar..",
        start: 67010,
        end: 28840
    }
  ]
}

Content Moderation

The content moderation model lets you detect inappropriate content in text/documents to ensure that your content is safe for all audiences. For a given piece of content, OmniAI will return a label along with a confidence score.

  • Labels:

    • hate, harassment, threatening, self-harm, sexual, sexual/minors, violence, insult, identity_hate

  • Parameters:

    • CONFIDENCE SCORE - Threshold for flagging hazardous content.

content: "Listen, and understand! That Terminator is out there! It can't 
          be bargained with. It can't be reasoned with. It doesn't feel pity, 
          or remorse, or fear. And it absolutely will not stop... ever, 
          until you are dead!"

response: [
            { label: "violence", confidence: 0.84 },
            { label: "threatening", confidence: 0.52 },
          ]

Translation

Translate columns between a variety of different languages. OmniAI supports direct translation to 40 languages. For more information about language support, see Language Support

Custom Prompts

Provide a custom LLM prompt to extract or generate values in a specific format. This tool is valuable when you want to transform your data to a very specific use case. Ex: generating custom email copy based on information from your CRM notes.

  • PROMPT: A string describing what transformation you would like to perform. You can reference columns within your table via template literals.

  • TYPE: One of the following STRING, NUMBER, BOOLEAN, ENUM

Prompting example:

Generate a 2-3 line email for {{FIRST_NAME}} introducing them to OmniAI. 
Pulling data from the following CRM notes: {{NOTES}}
Sign the email with the following: 

John Doe
john@getomni.ai

PII Redaction

The PII Redaction tool lets you minimize sensitive information about individuals by automatically identifying and removing it from your content. There are two options for PII redaction, HASH substitutes, and ENTITY substitutes.

  • With hash substitution: Hi, my name is ####!

  • With entity substitution: Hi, my name is [PERSON_NAME]!

Last updated