Skip to content

The Model Quality menu

The Model Quality (or NLU QA) menu allows you to evaluate and monitor over time the quality/relevance/performance of conversational models.

The Model Stats tab

This screen presents graphs to track the evolution of several indicators of the quality of the conversational model:

  • Relevance: the scores of the detection algorithms on intentions (Intent average probability) and on entities (Entity average probability)

  • Traffic / errors: the number of requests to the model (Calls) and the number of errors (Errors)

  • Performance: the response time of the model (Average call duration)

NLP - QA admin interface

The Intent Distance tab

The metrics presented in the table on this page (Occurrences and Average Diff) allow you to identify intentions that are more or less close in the model, in particular to optimize the modeling.

The Model Builds tab

This screen presents statistics on the latest reconstructions of the model. These are therefore indications on the performance of the model.

The Partial model tests are a classic way to detect qualification errors, or problems of proximity of intentions (or entities) between them.

This involves taking a part of the current model at random (for example 90% of the sentences of the model) in order to build a slightly less relevant model, then testing the remaining 10% with this new model.

The principle established, all that remains is to repeat the process a certain number of times so that the most frequent errors are presented to a manual corrector.

Note that these tests are only useful with already substantial models.

This tab shows the evolution of the relevance of the partial model tests.

By default, the tests are scheduled to be launched from midnight to 5am, every 10 minutes. It is possible to configure this behavior with the tock_test_model_timeframe property (default: 0.5).

The Test Intent Errors tab

This screen shows the results of the partial tests of intent detection (see above), with the details of the phrases/expressions recognized differently from the real model.

Tock schema

In this example, no "real" error was detected. However, we can see that in some cases the model is systematically wrong, with a high probability.

For each sentence, it is possible via the Actions column to confirm that the basic model is correct (with Validate Intent) or to correct the detected error (Change The Intent).

It is interesting to periodically analyze these differences, some differences being well explained, even being sometimes "assumed" (false negatives), others can reveal a problem in the model.

The Test Entity Errors tab

Like Intent Test Errors for entities, this screen presents the results of partial tests for entity detection.

Tock schema

It is interesting to periodically analyze these differences, some differences being well explained, even being sometimes "assumed" (false negatives), others can reveal a problem in the model.

Continue...

Go to Menu Settings for the rest of the user manual.

You can also go directly to the next chapter: Development.

Chat with Tock
×