Meta Unveils Llama 3.1: A Challenger in the AI Arena
Meta launches new Llama 3.1 models, including anticipated 405B parameter version.
Meta released Llama 3.1, a multilingual LLM collection. Llama 3.1 includes pretrained and instruction-tuned text in/text out open sourceĀ generative AI modelsĀ with 8B, 70B, and 405B parameters.
Today, IBM watsonx.ai will offer the instruction-tuned Llama 3.1-405B, the largest and most powerful open source language model available and competitive with the best proprietary models.It can be set up on-site, in a hybrid cloud environment, or on the IBM cloud.
Llama 3.1 follows the April 18 debut ofĀ Llama 3Ā models. Meta stated in the launch release that ā[their] goal in the near future is to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance acrossĀ LLM capabilitiesĀ such as reasoning and coding.ā
Llama 3.1ās debut today shows tremendous progress towards that goal, from dramatically enhanced context length to tool use and multilingual features.
An significant step towards open, responsible, accessible AI innovation
Meta and IBM launched theĀ Ā AIĀ Alliance in December 2023 with over 50 global initial members and collaborators. The AI Alliance unites leading business, startup, university, research, and government organisations to guide AIās evolution to meet societyās requirements and complexities. Since its formation, the Alliance has over 100 members.
Additionally, the AI Alliance promotes an open community that helps developers and researchers accelerate responsible innovation while maintaining trust, safety, security, diversity, scientific rigour, and economic competitiveness. To that aim, the Alliance supports initiatives that develop and deploy benchmarks and evaluation standards, address society-wide issues, enhance globalĀ AI capabilities, and promote safe and useful AI development.
Llama 3.1 gives the globalĀ Ā AIĀ community an open, state-of-the-art model family and development ecosystem to explore, experiment, and responsibly scale new ideas and techniques. The release features strong new models, system-level safety safeguards, cyber security evaluation methods, and improved inference-time guardrails. These resources promoteĀ generative AIĀ trust and safety tool standardisation.
How Llama 3.1-405B compares to top models
The April release of Llama 3 highlighted upcoming Llama models with āover 400B parametersā and some early model performance evaluation, but their exact size and details were not made public until todayās debut. Llama 3.1 improves all model sizes, but the 405B open source model matches leading proprietary, closed source LLMs for the first time.
Looking beyond numbers
Performance benchmarks are not the only factor when comparing the 405B to other cutting-edge models. Llama 3.1-405B may be built upon, modified, and run on-premises, unlike its closed source contemporaries, which can change their model without notice. That level of control and predictability benefits researchers, businesses, and other entities that seek consistency and repeatability.
Effective Llama-3.1-405B usage
IBM, like Meta, believes open models improve product safety, innovation, and theĀ Ā AIĀ market. An advanced 405B-parameter open source model offers unique potential and use cases for organisations of all sizes.
Aside from inference and text creation, which may require quantisation or other optimisation approaches to execute locally on most hardware systems, the 405B can be used for:
Synthetic data can fill the gap in pre-training, fine-tuning, and instruction tuning when data is limited or expensive. The 405B generates high-quality task- and domain-specific synthetic data for LLM training. IBMās Large-scale Alignment for chatBots (LAB) phased-training approach quickly updates LLMs with synthetic data while conserving model knowledge.
The 405B modelās knowledge and emergent abilities can be reduced into a smaller model, combining the capabilities of a big āteacherā model with the quick and cost-effective inference of a āstudentā model (such an 8B or 70B Llama 3.1). Effective Llama-based models like Alpaca and Vicuna need knowledge distillation, particularly instruction tailoring on synthetic data provided by bigger GPT models.
LLM-as-a-judge: The subjectivity of human preferences and the inability of standards to approximate them makeĀ LLM evaluationĀ difficult. The Llama 2 research report showed that larger models can impartially measure response quality in other models. Learn more about LLM-as-a-judgeās efficacy in this 2023 article.
A powerful domain-specific fine-tune: Many leading closed models allow fine-tuning only on a case-by-case basis, for older or smaller model versions, or not at all. Meta has made Llama 3.1-405B accessible for pre-training (to update the modelās general knowledge) or domain-specific fine-tuning coming soon toĀ the watsonxĀ Tuning Studio.
MetaĀ Ā AIĀ āstrongly recommendsā using a platform likeĀ IBM watsonxĀ for model evaluation, safety guardrails, and retrieval augmented generation to deploy Llama 3.1 models.
Every llama 3.1 size gets upgrades
The long-awaited 405B model may be the most notable component of Llama 3.1, but itās hardly the only one. Llama 3.1 models share the dense transformer design ofĀ Llama 3, but they are much improved at all model sizes.
Longer context windows
All pre-trained and instruction-tuned Llama 3.1 models have context lengths of 128,000 tokens, a 1600% increase over 8,192 tokens in Llama 3. Llama 3.1ās context length is identical to the enterprise version ofĀ GPT-4o, substantially longer than GPT-4 (or ChatGPT Free), and comparable to Claude 3ās 200,000 token window. Llama 3.1ās context length is not constrained in situations of high demand because it can be installed on the userās hardware or through a cloud provider.. Llama 3.1 has few usage restrictions.
An LLM can consider or ārememberā a certain amount of tokenised text (called its context window) at any given moment. To continue, a model must trim or summarise a conversation, document, or code base that exceeds its context length. Llama 3.1ās extended context window lets models have longer discussions without forgetting details and ingest larger texts or code samples during training and inference.
Text-to-token conversion doesnāt have a defined āexchange rate,ā but 1.5 tokens per word is a good estimate. Thus, Llama 3.1ās 128,000 token context window contains 85,000 words. The Hugging Face Tokeniser Playground lets you test multiple tokenisation models on text inputs.
Llama 3.1 models benefit from Llama 3ās new tokeniser, which encodes language more effectively thanĀ Llama 2.
Protecting safety
Meta has cautiously and thoroughly expanded context length in line with its responsible innovation approach. Previous experimental open source attempts produced Llama derivatives with 128,000 or 1M token windows. These projects demonstrate Metaās open model commitment, however they should be approached with caution: Without strong countermeasures, lengthy context windows āpresent a rich new attack surface for LLMsā according to recent study.
Fortunately, Llama 3.1 adds inference guardrails. The release includes direct and indirect prompt injection filtering from Prompt Guard and updated Llama Guard and CyberSec Eval. CodeShield, a powerful inference time filtering technology from Meta, prevents LLM-generated unsafe code from entering production systems.
As with any generativeĀ Ā AIĀ solution, models should be deployed on a secure, private, and safe platform.
Multilingual models
Pretrained and instruction tailored Llama 3.1 models of all sizes will be bilingual. In addition to English, Llama 3.1 models speak Spanish, Portuguese, Italian, German, and Thai. Meta said āa few other languagesā are undergoing post-training validation and may be released.
Optimised for tools
Meta optimised the Llama 3.1 Instruct models for ātool use,ā allowing them to interface with applications that enhance the LLMās capabilities. Training comprises creating tool calls for specific search, picture production, code execution, and mathematical reasoning tools, as well as zero-shot tool useāthe capacity to effortlessly integrate with tools not previously encountered in training.
Starting Llama 3.1
Metaās latest version allows you to customise state-of-the-art generativeĀ Ā AIĀ models for your use case.
IBM supports Llama 3.1 to promote open source AI innovation and give clients access to best-in-class open models in watsonx, including third-party models and the IBM Granite model family.
IBM Watsonx allows clients to deploy open source models like Llama 3.1 on-premises or in their preferred cloud environment and use intuitive workflows for fine-tuning, prompt engineering, and integration with enterprise applications. Build business-specific AI apps, manage data sources, and expedite safeĀ AI workflowsĀ on one platform.
Read more on govindhtech.com













