Rm@i
Let {(ci,ri),1 <= i <=n} be the list of m context-response pairs from the test set. For each context ci, we create a set of m alternative responses, one response being the actual response ri, and the m-1 other responses being sampled at random from the same corpus. The m alternative responses are then ranked based on the output from the conversational model, and the Recallm@i measures how often the correct response appears in the top i results of this ranked list. The Recallm@i metric is often used for the evaluation of retrieval models as several responses may be equally “correct” given a particular context.
Precision@K
Set a rank threshold K
Compute % relevant in top K
Ignores documents ranked lower than K
Ex:
Prec@3 of 2/3
Prec@4 of 2/4
Prec@5 of 3/5