Skip to main content

Data types for training

Caila supports a range of data types on which you can train and test fittable services.

Dataset typeUsed in servicesFile example
csv/faqClassifiers, FAQDownload
csv/texts-and-labelsClassifiersDownload
json/anyAny
json/caila-intentsClassifiers, FAQDownload
json/faqClassifiers, FAQDownload
json/linesLLM fine-tuning serviceDownload
json/texts-and-labelsClassifiersDownload
json/textsCDQA, loadtestDownload
json/transformer-fitClassifiers
plain/textsCDQA, loadtestDownload
xlsx/faqClassifiers, FAQDownload
json/tts-dictionaryaimyvoice-customDownload

Type names start with a data format, such as json or csv, followed by a dataset content type after the slash.

Data formats

FormatDescriptionExtension
plainPlain text with no specific format.Usually TXT
jsonText format that stores simple data structures and associative arrays (objects).JSON
csvText format where each value is separated by a comma or other separator.
The first column typically contains the names of the entity data fields. Each row represents the data for one entity.
CSV
xlsxThe format is used in spreadsheet programs like Microsoft Excel.
The first column typically contains the names of the entity’s data fields. Each row represents the data for one entity.
XLS, XLSX

Dataset content types

TypeDescription
anyA file of any format.
Use this type if the built-in types are not suitable for you. The service must independently verify that the dataset content is correct.
caila-intentsA file with intents exported from the JAICP project. For more details on exporting intents and the data structure, please refer to the JAICP documentation.
faqA file containing questions and answers, along with additional fields.
Designed for training the FAQ service used in JAICP. For more information on the available fields, please refer to the JAICP documentation.
linesA file in which each line is an object in JSON format.
textsA file in which each line is plain text with no specific format.
texts‑and‑labelsA file with texts and their corresponding labels.
transformer‑fitInternal technical format file.
tts-dictionaryA file in which each text corresponds to the expected pronunciation. Used for configuring speech synthesis in Aimyvoice.

Automatic conversion

Automatic conversion of one dataset content type to another is implemented in Caila:

  • caila-intentsfaq;
  • caila-intentstexts-and-labels;
  • faqcaila-intents;
  • texts-and-labelstransformer-fit.

If you upload a dataset in one format while the service requires another, the platform will attempt to convert your dataset automatically. Thus, automatic conversion expands the list of services you can train using your dataset.

The content type can be converted several times, for example: caila-intentstexts-and-labelstransformer-fit. In addition to content type conversion, data format conversion is also supported, for example csvxlsx.