Tock Installation

The architecture page presents the Tock functional and technical architecture, the role of the different components as well as the different deployment modes.

This chapter presents the different Tock installation options. In particular, it is about discussing the case of a production installation as well as sharing some feedback on performance, resilience, Tock's ability to scale, Cloud type deployments, monitoring, etc.

If you are only looking to test Tock with non-sensitive data, you may prefer to use the Tock demo platform.

Installation with Docker

The tock-docker repository provides a complete implementation of Tock for the Docker and Docker Compose technologies.

Tock is composed by default of several Docker containers/images and a MongoDB database.

For more information on installing Tock with Docker, see the instructions in the tock-docker repository.

The deploy Tock with Docker guide in the Discover Tock section gives an example of deploying a complete platform in a few minutes with a minimal footprint using Docker and Docker Compose. However, this method is not feasible for a long-term deployment such as a production platform.

If you want to use Docker Compose in production, please read this article and review the configuration, which is only given in the tock-docker project as an example. In particular, the configuration of MongoDB instances should be reviewed carefully.

Installation without Docker

It is entirely possible to install Tock without using Docker. By analyzing the descriptors provided in tock-docker (ie. the pom.xml files, the Dockerfile and docker-compose.yml) one can easily design an installation without Docker.

Except for the MongoDB database, all other components can be started like classic Java/JVM applications, for example:

directly from the command line
within a Java application server
from an integrated development tool (IDE)
etc.

To learn more about the launch parameters of the different Tock components, you can take inspiration from the commands present in the tock-docker descriptors or from the configurations provided for IntelliJ (see below).

Command line

One technique is to gather the different dependencies and JAR archives in a folder and then start the component or application with a classic Java command.

For example, the component descriptor tock-docker-nlp-api (see pom.xml) with the following command:

java $JAVA_ARGS -Dfile.encoding=UTF-8 -cp '/maven/*' ai.tock.nlp.api.StartNlpServiceKt

Executable JAR

This is not the technique we recommend, but it is possible to run a single JAR containing all dependencies (sometimes called "fat JAR"). Here is how to create such a JAR, using the example of the Tock NLP-API component.

In the component POM (nlp/api/service/pom.xml), add the following declaration:

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                        <configuration>
                            <archive>
                                <manifest>
                                    <mainClass>ai.tock.nlp.api.StartNlpServiceKt</mainClass>
                                </manifest>
                            </archive>
                            <descriptors>
                                <descriptor>src/main/assembly/jar-with-dependencies.xml</descriptor>
                            </descriptors>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

Also create an archive descriptor nlp/api/service/src/main/assembly/jar-with-dependencies.xml with the following content:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.0.0 http://maven.apache.org/xsd/assembly-2.0.0.xsd">
    <id>jar-with-dependencies</id>
    <formats>
        <format>jar</format>
    </formats>
    <includeBaseDirectory>false</includeBaseDirectory>
    <dependencySets>
        <dependencySet>
            <outputDirectory>/</outputDirectory>
            <useProjectArtifact>true</useProjectArtifact>
            <unpack>true</unpack>
            <scope>runtime</scope>
        </dependencySet>
    </dependencySets>
    <containerDescriptorHandlers>
        <containerDescriptorHandler>
            <!-- Merge service implementations from dependencies -->
            <handlerName>metaInf-services</handlerName>
        </containerDescriptorHandler>
    </containerDescriptorHandlers>
</assembly>

Finally, build the "jar-with-dependencies" archive with mvn package.

In an IDE

For development, it is possible to run the different Tock components (NLU, Studio, bot...) from an IDE like IntelliJ, Eclipse or Visual Studio Code for example.

In addition to the Docker images, configurations for IntelliJ are provided with the Tock sources:

Configuration Full Tock Studio services (Bot + NLP) / BotAdmin
Configuration Tock Studio services (NLP only) / Admin
Configuration NLP service / NlpService
Configuration Entity Service / Duckling
Configuration NLP model construction service / BuildWorker
Configuration Script compilation service / KotlinCompilerServer

Finally, to launch the user interfaces (Tock Studio), the commands are described in the following link:

Instructions Full Tock Studio interface (Bot + NLP)

MongoDB database

replica set architecture

The MongoDB database must be configured in replica set, because Tock takes advantage of change streams.

This implies that at least 3 nodes must be deployed, which improves resilience.

Different scenarios are possible for the database:

Install MongoDB nodes on one or more servers (classic method)
Instantiate MongoDB nodes with Docker (for testing or local development)
Use a MongoDB cloud service in SaaS (Software-as-a-Service), for example MongoDB Atlas available on AWS, Azure and GCP

A replica set installation tutorial is available on the MongoDB website.

Data retention

Tock stores different types of data in its database and applies TTL (Time To Live), so that some expire and are purged automatically after a certain time.

In practice, environment variables and the application of TTL occur when the DAO (Data Access Object) components are initialized, when Tock starts.

Tock's TTL have a default value and are configurable using environment variables. Some concern a specific Tock component, others must be defined on several components.

Since Tock can be used as a complete conversational platform or only the NLU/NLP part, we indicate the variables specific to conversational (denoted Bot) or usable on all types of platforms (denoted *).

Platform(s)	Environment variable	Default value	Description	Affected component(s)
*	`tock_nlp_classified_sentences_index_ttl_days`	`-1` (no expiration)	Unvalidated sentences (Inbox).	`nlp_api`, `nlp_admin`/`bot_admin`, `worker`
*	`tock_nlp_classified_sentences_index_ttl_intent_names`	Empty (all intents)	Unvalidated sentences (Inbox) >> restriction to certain intents, separated by commas. (Example below).	`nlp_api`
*	`tock_nlp_log_index_ttl_days`	`7`	NLP logs: sentence, intents, scores, entity details, etc.	`nlp_api`
*	`tock_nlp_log_stats_index_ttl_days`	`365`	NLP statistics: number of occurrences of a sentence, scores, etc.	`nlp_api`
*	`tock_user_log_index_ttl_days`	`365`	Log of actions in Tock Studio: Stories changes, etc.	`nlp_admin`/`bot_admin`
Bot	`tock_bot_alternative_index_ttl_hours`	`1`	Index on label alternatives (Answers).	`bot`/`bot_api`
Bot	`tock_bot_dialog_index_ttl_days`	`7`	Conversations (Analytics > Users/Search).	`bot`/`bot_api`, `nlp_admin`/`bot_admin`
Bot	`tock_bot_dialog_max_validity_in_seconds`	`60 * 60 * 24` (24h)	Conversation contexts (current intention, entities on the bus, etc.).	`bot`/`bot_api`, `nlp_admin`/`bot_admin`
Bot	`tock_bot_flow_stats_index_ttl_days`	`365`	Browsing statistics (Analytics > Activity/Behavior).	`bot`/`bot_api`, `nlp_admin`/`bot_admin`
Bot	`tock_bot_timeline_index_ttl_days`	`365`	User profiles/history: preferences, locale, last login, etc. (excluding conversation details)	`bot`/`bot_api`, `nlp_admin`/`bot_admin`

Depending on the deployment mode used, these environment variables can be added either directly on the command line, or in a descriptor such as docker-compose.yml, dockerrun.aws.json or other (example below).

It is possible to automatically remove unvalidated sentences (Inbox) for certain intents only, thanks to tock_nlp_classified_sentences_index_ttl_intent_names :

{: data-hl-lines="6 7"} docker-compose.yml

version: "3"
services:
  admin_web:
    image: tock/bot_admin:$TAG
    environment:
      - tock_nlp_classified_sentences_index_ttl_days=10
      - tock_nlp_classified_sentences_index_ttl_intent_names=greetings,unknown

{: data-hl-lines="8 9"} dockerrun.aws.json

{
  "AWSEBDockerrunVersion": 2,
  "containerDefinitions": [
    {
      "name": "admin_web",
      "image": "tock/bot_admin:${TAG}",
      "environment": [
        {
          "name": "tock_nlp_classified_sentences_index_ttl_days",
          "value": "10"
        },
        {
          "name": "tock_nlp_classified_sentences_index_ttl_intent_names",
          "value": "greetings,unknown"
        }
      ]
    }
  ]
}

{: .tabbed-code}

In this example, only sentences detected as greetings or unknown intents (but not validated) will be deleted after 10 days; other sentences will not be deleted.

Only sentences validated by a user in Tock Studio, integrating the bot's NLP model, never expire by default (even if it is still possible to delete them from the model via the Search > Status: Included in model view): it is therefore important not to validate sentences containing personal data for example.

Data retention, encryption and anonymization are essential to protect data, especially if it is personal. For more information, see the Security > Data section.

Application Components

Depending on Tock's application components, whether mandatory or optional, some must be single-instance and others can be deployed in multiple instances (see the high availability section for more information).

For convenience, the components below are named after the Docker images provided with Tock, although using Docker is not required to install Tock.

Network Exposure

By default, the components or containers of the Tock platform must not be exposed outside the VPN or VPC. Only the bot itself must be accessible to the partners and external channels with which we want to integrate, for the functioning of the WebHooks.

Component / Image	Network exposure	Description
`tock/bot_admin`	VPN / VPC only	Interfaces and tools Tock Studio
`tock/build_worker`	VPN / VPC only	Automatically rebuilds models whenever needed
`tock/duckling`	VPN / VPC only	Parses dates and primitive types using Duckling
`tock/nlp_api`	VPN / VPC only	Parses sentences from models built in Tock Studio
`tock/bot_api`	VPN / VPC only	API for developing bots (Tock Bot API mode)
`tock/kotlin_compiler`	VPN / VPC only	(Optional) Script compiler to enter them directly in the Build interface of Tock Studio
bot (not provided)	Internet / partners	The bot itself, implementing the programmatic journeys, accessible to external partners/channels via WebHooks

Of course, the implementation of the bot itself is not provided with Tock (everyone implements their own features for their needs).

HTTP Proxies

The Java System Properties https.proxyHost, http.proxyHost, and http.nonProxyHosts are the recommended way to configure a proxy.

Bot Packaging

A sample bot in Tock Bot Embedded mode is available in docker-compose-bot-open-data.yml.

Examples and guidelines for packaging bots in Tock Bot API mode (WebHooks, WebSockets) will be available soon.

Minimum configurations

Tock architecture is composed of several components that can be deployed together on the same server, or distributed across multiple machines/instances.

The main parameter to monitor is the available RAM.

Model building

The larger your models, the more memory is needed to rebuild the models (tock/build_worker component).

To give an order of magnitude, a model of 50,000 sentences with several intents, comprising about twenty entities, will require provisioning about 8 GB of RAM for the tock/build_worker component.

However, large models with few entities can easily run with only 1 GB of RAM.

JVM & Docker Memory

To ensure that Docker containers/instances do not exceed the available memory, it is recommended to limit the memory of JVMs by following the following example:

JAVA_ARGS=-Xmx1g -XX:MaxMetaspaceSize=256m

Machine optimization

It is possible to optimize deployments and infrastructures by taking into account different elements such as:

the needs of the respective components in machine resources: CPU, memory, disk
the interest of having one or more instances of each component according to its role
the constraints/objectives of resilience and high availability
the cost models, particularly among public cloud providers

Examples

For information, here are some examples of configurations currently in production. These are the "application" components of the Tock architecture without the MongoDB database.

EC2 instance types are for reference only. Tock has no dependencies on AWS. For more information, see the AWS documentation.

Limited Size Models

Tock Components	Number of Instances	Number of CPUs or vCPUs	RAM	Example EC2 Instance Type
`admin-web` + `build-worker` + `kotlin-compiler` + `duckling`	1	2	4 GB	`t3a.medium` (general purpose)
`bot` + `nlp-api` + `duckling`	3	2	4 GB	`t3a.medium` (general purpose)

Large Models

Tock Components	Number of Instances	Number of CPUs or vCPUs	RAM	Example EC2 Instance Type
`admin-web` + `build-worker` + `kotlin-compiler` + `duckling`	1	2	16 GB	`r5a.large` (memory optimized)
`bot` + `nlp-api` + `duckling`	3	2	4 GB	`t3a.medium` (general purpose)

Frequently Asked Questions

Making the administration interface available in a subdirectory

By default, the administration interface is served at the root (Example: https://[domain host]) If you want to make it available on a relative path (https://[domain host]/tock), use in the configuration of the docker image tock/bot_admin the environment variable botadminverticle_base_href.

For example: botadminverticle_base_href=tock

For tock/nlp_admin, you must use the property adminverticle_base_href.