neatComponents - Train a Model

neatComponents is the hybrid-cloud database engine that powers clearString.
Previous page	AI Integration	Next page
Train a Model

Once a Model has been created, it can be Trained.

The essence of this is to present the model with a set of images, and for each image identify which category it is in. Each image can be in a single category.

Training images

Number of images to use for training

There is no set minimum or maximum number of images that should be used for training. The optimum number will depend on the nature of the images, how distinct each category of image is to each other. As a rough guide, somewhere between 10 and 50 images for each category will likely give good results.

Number of categories

There is no limit to the number of categories. The minimum is two, but there could be hundreds or even thousands of different categories.

Note that as far as the ML model is concerned, categories are simply identified by providing the category name as a text string, and when it provides classification, it will do so by providing a text string for the predicted category. This string can then be matched against the name field in a clearString table for further processing where there would be a record for each category, but the ML system is not concerned with is aspect and simply works with text strings for the categorisation.

Testing images

In addition to accepting a set of categorised images for training, the action also accepts a separate set for testing. While this is optional, it is highly recommended, in order to give numerical feedback on the expected effectiveness of the model to correctly identify images within each category.

Typically the testing set is smaller than the training set (so if say you are training with 50 images per category, the testing set would be an additional 5 images per category).

For the testing metrics to be valid, it is important that the training and testing images are different from each other. ie the model must not have been trained on any of the images that are presented for training.

To Train a model

Use the ML - Train Model event action

This action must run in background execution. The length of time it takes to run will be approximately proportionate to the number of images being presented as part of the training, and will be dependent on the speed of the underlying hardware on the server. It could therefore take several seconds or even minutes to process. You should therefore consider building in logic to use the Diagnostics fields to detect when training is completed, to avoid users attempting to use the model before it is ready.

On the Configuration tab:

Input

Model to Train

The ModelID of the Model that will be trained. Use the ML - Create Model action to obtain this ModelID and store it in a table field for future reference.

Training Data

A query that contains a set of records. The sequence of the records is not important.

The records should contain two fields:

Image (type image) - The image to be trained on
Classification (type text) - The category that the image is classified as

Testing Data

A query that contains a set of records. The sequence of the records is not important.

The records should contain two fields:

Image (type image) - The image the model is to be tested on
Classification (type text) - The category that the image is classified as

Diagnostics

This section gives the standard event action diagnostic feedback.

Completed OK

Store in a Checkbox field

Feedback

Store in a Text field

Testing Summary

Store in a Large Text Field

This provides detailed feedback on the training, including an assessment of how well it has been trained, based on how well it recognised the testing images.

Retraining

It is quite usual to iteratively improve a model by adding images and retraining. To do this, add more images to the dataset in your tables, and then re-run the training action. The AI model will forget everything from its earlier training and retrain from the new images presented to it. This means that you must present everything as part of the retraining, you cannot just provide it with the additions.

Translate: