Instantiating a configuration with theĭefaults will yield a similar configuration to that of the LLaMA-7B.Ĭonfiguration objects inherit from PretrainedConfig and can be used to control the model outputs. Model according to the specified arguments, defining the model architecture. This is the configuration class to store the configuration of a LlamaModel. tie_word_embeddings( bool, optional, defaults to False). Whether or not the model should return the last key/values attentions (not used by all models). use_cache ( bool, optional, defaults to True).The epsilon used by the rms normalization layers. rms_norm_eps ( float, optional, defaults to 1e-12).The standard deviation of the truncated_normal_initializer for initializing all weight matrices. initializer_range ( float, optional, defaults to 0.02).The maximum sequence length that this model might ever be used with. max_position_embeddings ( int, optional, defaults to 2048).The non-linear activation function (function or string) in the decoder. hidden_act ( str or function, optional, defaults to "silu").Number of attention heads for each attention layer in the Transformer encoder.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |