Αρχιτεκτονική

Vision Transformer (ViT)

Ορισμός

An adaptation of the transformer architecture to image data, treating fixed-size image patches as tokens. ViTs now outperform convolutional networks on many computer vision benchmarks and are used in medical imaging, satellite analysis, and industrial quality control.

Σχετικοί Όροι

Transformer

The dominant neural network architecture for language, vision, and multimodal AI, introduced in the 2017 "Attention Is All You Need" paper. Transformers use self-attention to process all tokens in parallel, enabling training on internet-scale data and powering every major LLM in use today.

Computer Vision

A field of AI that enables machines to interpret and understand visual information from the world—images, video, and sensor feeds. It underpins applications from quality-control cameras on factory floors to facial recognition and autonomous vehicle perception.

Multimodal AI

AI systems that process and reason across multiple types of data simultaneously—text, images, audio, and video. Multimodal models enable richer enterprise applications such as document understanding that combines tables, charts, and prose.

Σχετικές Υπηρεσίες

Product Leadership Program

Χρειάζεστε Βοήθεια για να Κατανοήσετε το AI;

Κλείστε μια κλήση καταλληλότητας Physical AI για να συζητήσετε πώς αυτές οι έννοιες AI εφαρμόζονται στον κλάδο και τις προκλήσεις σας.