|
Post by joitarani333 on Apr 30, 2024 4:48:24 GMT
Public testing was carrie out on MERA an instructional benchmark for Russianlanguage models. It allows you to evaluate LLM in various aspects from the ability to solve mathematical problems to answering ethical questions. We chose this benchmark since we can fairly evaluate our model and compare it with others poste on the leaderboard. Additionally we create our own benchmark MTS AI InstructruK. It came about because our team wante objective metrics but because language model Restaurant Email List training data may include publicly available benchmarks the evaluation could be skewe. In addition to this our prompt engineers create a benchmark to evaluate the models ability to solve business problems. We havent come up with a name for it yet but you can suggest your options in the comments. This benchmark tests how well the model can analyze a conversation between two people draw conclusions and extract important information from the text. Benchmark structure Any language model develope to communicate with agent must be comprehensively develope and be able to cope well with a range of functions. In the field of LLM validation there are generally accepte tasks that allow one to assess the level of competence of the model these include for example checking the erudition of the model understanding of the structure of the world and knowlege of language world commonsense and linguistic knowlege respectively.
|
|