Hadas
Raviv

Utilizing Human Annotations for Complex Evaluation Tasks

AI21 labs
Hadas Raviv

Hadas
Raviv

Utilizing Human Annotations for Complex Evaluation Tasks

AI21 labs
Hadas Raviv

Bio

Dr. Hadas Raviv is a senior data scientist at AI21 Labs, a unique startup aimed at building AI systems with an unprecedented capacity to understand and generate natural language. She is passionate about language understanding and dedicated the last 13 years to work on related topics.

 

Hadas holds a MsC in physics from Tel-Aviv university and PhD in information retrieval from the Technion, her research topic was entity-based retrieval.

Bio

Dr. Hadas Raviv is a senior data scientist at AI21 Labs, a unique startup aimed at building AI systems with an unprecedented capacity to understand and generate natural language. She is passionate about language understanding and dedicated the last 13 years to work on related topics.

 

Hadas holds a MsC in physics from Tel-Aviv university and PhD in information retrieval from the Technion, her research topic was entity-based retrieval.

Abstract

Subjective human annotations are often too noisy for evaluation tasks in which a complex, possibly multi dimensional decision has to be made. In this lightning talk we propose a method for reducing that noise and utilizing the annotations for evaluating the output of machine learning models.

 

Specifically, we demonstrate its application for evaluating the fluency of machine-generated texts. We present empirical results demonstrating the effectiveness of our method and show that it can be used for evaluating and comparing the quality of different generation models.

Abstract

Subjective human annotations are often too noisy for evaluation tasks in which a complex, possibly multi dimensional decision has to be made. In this lightning talk we propose a method for reducing that noise and utilizing the annotations for evaluating the output of machine learning models.

 

Specifically, we demonstrate its application for evaluating the fluency of machine-generated texts. We present empirical results demonstrating the effectiveness of our method and show that it can be used for evaluating and comparing the quality of different generation models.