CosineSimilarity

Returns the similarity between two embedding vectors as a number between -1 (opposite) and 1 (similar).

Format 

CosineSimilarity ( v1 ; v2 )

Parameters 

v1 and v2 - any text expression, text field, or container field that contains embedding vectors. In general, the two embedding vectors should be produced by the same model for this function to return a meaningful value.

Data type returned 

number

Originated in version 

21.0

Description 

This function returns a measure of the similarity between two embedding vectors using the cosine method. For embedding vectors, cosine similarity gives a useful measure of how similar two text values are likely to be. Results range from -1 to 1 (inclusive), with values closer to 1 indicating higher semantic similarity, 0 indicating no similarity, and -1 indicating dissimilarity.

If v1 and v2 are text, they must be in the form of JSON arrays. The vectors must also have the same dimensions (the number of elements in the arrays must be the same). Typically, though, using embedding vectors as binary container data improves performance.

Example 1 

CosineSimilarity ( "[0.2198736, -0.4397852, ... ]" ; "[0.2198736, -0.4397852, ... ]" ) returns .24175542211599998499 for a particular model.

Example 2 

CosineSimilarity ( v1 ; v2 ) returns .54682693950088512302 for a particular model when the v1 and v2 fields contain embedding vectors for the text "Claris" and "Claire," respectively.