Data types for training
Caila supports a range of data types on which you can train and test fittable services.
Dataset type | Used in services | File example |
---|---|---|
csv/faq | Classifiers, FAQ | Download |
csv/texts-and-labels | Classifiers | Download |
json/any | Any | — |
json/caila-intents | Classifiers, FAQ | Download |
json/faq | Classifiers, FAQ | Download |
json/lines | LLM fine-tuning service | Download |
json/texts-and-labels | Classifiers | Download |
json/texts | CDQA, loadtest | Download |
json/transformer-fit | Classifiers | — |
plain/texts | CDQA, loadtest | Download |
xlsx/faq | Classifiers, FAQ | Download |
json/tts-dictionary | aimyvoice-custom | Download |
Type names start with a data format, such as json
or csv
, followed by a dataset content type after the slash.
Data formats
Format | Description | Extension |
---|---|---|
plain | Plain text with no specific format. | Usually TXT |
json | Text format that stores simple data structures and associative arrays (objects). | JSON |
csv | Text format where each value is separated by a comma or other separator. The first column typically contains the names of the entity data fields. Each row represents the data for one entity. | CSV |
xlsx | The format is used in spreadsheet programs like Microsoft Excel. The first column typically contains the names of the entity’s data fields. Each row represents the data for one entity. | XLS, XLSX |
Dataset content types
Type | Description |
---|---|
any | A file of any format. Use this type if the built-in types are not suitable for you. The service must independently verify that the dataset content is correct. |
caila-intents | A file with intents exported from the JAICP project. For more details on exporting intents and the data structure, please refer to the JAICP documentation. |
faq | A file containing questions and answers, along with additional fields. Designed for training the FAQ service used in JAICP. For more information on the available fields, please refer to the JAICP documentation. |
lines | A file in which each line is an object in JSON format. |
texts | A file in which each line is plain text with no specific format. |
texts‑and‑labels | A file with texts and their corresponding labels. |
transformer‑fit | Internal technical format file. |
tts-dictionary | A file in which each text corresponds to the expected pronunciation. Used for configuring speech synthesis in Aimyvoice. |
Automatic conversion
Automatic conversion of one dataset content type to another is implemented in Caila:
caila-intents
→faq
;caila-intents
→texts-and-labels
;faq
→caila-intents
;texts-and-labels
→transformer-fit
.
If you upload a dataset in one format while the service requires another, the platform will attempt to convert your dataset automatically. Thus, automatic conversion expands the list of services you can train using your dataset.
The content type can be converted several times, for example: caila-intents
→ texts-and-labels
→ transformer-fit
.
In addition to content type conversion, data format conversion is also supported, for example csv
→ xlsx
.