The total number of tokens allocated for a model request, encompassing both input (prompt + context) and output. Managing token budgets is central to controlling inference cost in production LLM applications, especially when processing long documents or maintaining conversational history.
احجز استشارة لمناقشة كيفية تطبيق مفاهيم الذكاء الاصطناعي على تحدياتك.