This n8n template demonstrates how to calculate the evaluation metric "Similarity" which in this scenario, measures the consistency of the agent.
The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/_answer_similarity.py
How it works
- This evaluation works best where questions are close-ended or about facts where the answer can have little to no deviation.
- For our scoring, we generate embeddings for both the AI's response and ground truth and calculate the cosine similarity between them.
- A high score indicates LLM consistency with expected results whereas a low score could signal model hallucination.
Requirements