From ad5b0adbab2d03ddbf0e9bbd0f759abc24d3cb73 Mon Sep 17 00:00:00 2001
From: Xi Yan <xiyan@meta.com>
Date: Fri, 16 May 2025 09:01:59 -0700
Subject: [PATCH 1/4] docs

---
 docs/api-reference/models.md                  |  3 +
 .../concepts/model-providers/llamaapi.md      | 70 +++++++++++++++++++
 2 files changed, 73 insertions(+)
 create mode 100644 docs/user-guide/concepts/model-providers/llamaapi.md
diff --git a/docs/api-reference/models.md b/docs/api-reference/models.md
index 5f07e08a..24b6cf29 100644
--- a/docs/api-reference/models.md
+++ b/docs/api-reference/models.md
@@ -11,3 +11,6 @@
 ::: strands.models.ollama
     options:
       heading_level: 2
+::: strands.models.llamaapi
+    options:
+      heading_level: 2
diff --git a/docs/user-guide/concepts/model-providers/llamaapi.md b/docs/user-guide/concepts/model-providers/llamaapi.md
new file mode 100644
index 00000000..87033824
--- /dev/null
+++ b/docs/user-guide/concepts/model-providers/llamaapi.md
@@ -0,0 +1,70 @@
+# Llama API
+
+[Llama API](https://llama.developer.meta.com/) is a Meta-hosted API service that helps you integrate Llama models into your applications quickly and efficiently.
+
+Llama API provides access to Llama models through a simple API interface, with inference provided by Meta, so you can focus on building AI-powered solutions without managing your own inference infrastructure.
+
+With Llama API, you get access to state-of-the-art AI capabilities through a developer-friendly interface designed for simplicity and performance.
+
+## Installation
+
+Llama API is configured as an optional dependency in Strands Agents. To install, run:
+
+```bash
+pip install strands-agents[llamaapi]
+```
+
+## Usage
+
+After installing `llamaapi`, you can import and initialize Strands Agents' Llama API provider as follows:
+
+```python
+from strands import Agent
+from strands.models.llamaapi import LlamaAPIModel
+from strands_tools import calculator
+
+model = LlamaAPIModel(
+    client_args={
+        "api_key": "<KEY>",
+    },
+    # **model_config
+    model_id="Llama-4-Maverick-17B-128E-Instruct-FP8",
+)
+
+agent = Agent(model=model, tools=[calculator])
+response = agent("What is 2+2")
+print(response)
+```
+
+## Configuration
+
+### Client Configuration
+
+The `client_args` configure the underlying LlamaAPI client. For a complete list of available arguments, please refer to the LlamaAPI [docs](https://llama.developer.meta.com/docs/).
+
+
+### Model Configuration
+
+The `model_config` configures the underlying model selected for inference. The supported configurations are:
+
+|  Parameter | Description | Example | Options |
+|------------|-------------|---------|---------|
+| `model_id` | ID of a model to use | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
+| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
+| `temperature` | Controls randomness of the response by setting a temperature. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
+| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
+| `max_completion_tokens` | The maximum number of tokens to generate.  | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
+| `top_k` | Only sample from the top K options for each subsequent token.
+ | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
+
+
+## Troubleshooting
+
+### Module Not Found
+
+If you encounter the error `ModuleNotFoundError: No module named 'llamaapi'`, this means you haven't installed the `llamaapi` dependency in your environment. To fix, run `pip install strands-agents[llamaapi]`.
+
+## References
+
+- [API](../../../api-reference/models.md)
+- [LlamaAPI](https://llama.developer.meta.com/docs/)

From f496fed0d6296ed78940fa8a11d2adedfb2b62bc Mon Sep 17 00:00:00 2001
From: Xi Yan <xiyan@meta.com>
Date: Fri, 16 May 2025 09:38:53 -0700
Subject: [PATCH 2/4] fix docs

---
 docs/user-guide/concepts/model-providers/llamaapi.md | 11 +++++------
 mkdocs.yml                                           |  5 +++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/docs/user-guide/concepts/model-providers/llamaapi.md b/docs/user-guide/concepts/model-providers/llamaapi.md
index 87033824..debcd64a 100644
--- a/docs/user-guide/concepts/model-providers/llamaapi.md
+++ b/docs/user-guide/concepts/model-providers/llamaapi.md
@@ -50,12 +50,11 @@ The `model_config` configures the underlying model selected for inference. The s
 |  Parameter | Description | Example | Options |
 |------------|-------------|---------|---------|
 | `model_id` | ID of a model to use | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
-| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
-| `temperature` | Controls randomness of the response by setting a temperature. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
-| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
-| `max_completion_tokens` | The maximum number of tokens to generate.  | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
-| `top_k` | Only sample from the top K options for each subsequent token.
- | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
+| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) |  `1`  | [reference](https://llama.developer.meta.com/docs/)
+| `temperature` | Controls randomness of the response by setting a temperature. | 0.7 | [reference](https://llama.developer.meta.com/docs/)
+| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 | [reference](https://llama.developer.meta.com/docs/)
+| `max_completion_tokens` | The maximum number of tokens to generate.  | 4096 | [reference](https://llama.developer.meta.com/docs/)
+| `top_k` | Only sample from the top K options for each subsequent token. | 10 | [reference](https://llama.developer.meta.com/docs/)
 
 
 ## Troubleshooting
diff --git a/mkdocs.yml b/mkdocs.yml
index 84a663c7..f77221fe 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -75,6 +75,7 @@ nav:
         - Amazon Bedrock: user-guide/concepts/model-providers/amazon-bedrock.md
         - LiteLLM: user-guide/concepts/model-providers/litellm.md
         - Ollama: user-guide/concepts/model-providers/ollama.md
+        - LlamaAPI: user-guide/concepts/model-providers/llamaapi.md
         - Custom Providers: user-guide/concepts/model-providers/custom_model_provider.md
       - Streaming:
         - Async Iterators: user-guide/concepts/streaming/async-iterators.md
@@ -105,7 +106,7 @@ nav:
     - Weather Forecaster: examples/python/weather_forecaster.md
     - File Operations: examples/python/file_operations.md
     - Agents Workflows: examples/python/agents_workflows.md
-    - Knowledge-Base Workflow: examples/python/knowledge_base_agent.md 
+    - Knowledge-Base Workflow: examples/python/knowledge_base_agent.md
     - Multi Agents: examples/python/multi_agent_example/multi_agent_example.md
     - Meta Tooling: examples/python/meta_tooling.md
     - MCP: examples/python/mcp_calculator.md
@@ -167,4 +168,4 @@ validation:
     not_found: warn
     anchors: warn
     absolute_links: warn
-    unrecognized_links: warn
\ No newline at end of file
+    unrecognized_links: warn

From a4fbb846ae169f71505fc9e477342b3e34a77523 Mon Sep 17 00:00:00 2001
From: Xi Yan <xiyan@meta.com>
Date: Fri, 16 May 2025 09:46:45 -0700
Subject: [PATCH 3/4] links

---
 docs/user-guide/concepts/model-providers/llamaapi.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/user-guide/concepts/model-providers/llamaapi.md b/docs/user-guide/concepts/model-providers/llamaapi.md
index debcd64a..dbe0390b 100644
--- a/docs/user-guide/concepts/model-providers/llamaapi.md
+++ b/docs/user-guide/concepts/model-providers/llamaapi.md
@@ -50,11 +50,11 @@ The `model_config` configures the underlying model selected for inference. The s
 |  Parameter | Description | Example | Options |
 |------------|-------------|---------|---------|
 | `model_id` | ID of a model to use | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
-| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) |  `1`  | [reference](https://llama.developer.meta.com/docs/)
-| `temperature` | Controls randomness of the response by setting a temperature. | 0.7 | [reference](https://llama.developer.meta.com/docs/)
-| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 | [reference](https://llama.developer.meta.com/docs/)
-| `max_completion_tokens` | The maximum number of tokens to generate.  | 4096 | [reference](https://llama.developer.meta.com/docs/)
-| `top_k` | Only sample from the top K options for each subsequent token. | 10 | [reference](https://llama.developer.meta.com/docs/)
+| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) |  `1`  | [reference](https://llama.developer.meta.com/docs/api/chat)
+| `temperature` | Controls randomness of the response by setting a temperature. | 0.7 | [reference](https://llama.developer.meta.com/docs/api/chat)
+| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 | [reference](https://llama.developer.meta.com/docs/api/chat)
+| `max_completion_tokens` | The maximum number of tokens to generate.  | 4096 | [reference](https://llama.developer.meta.com/docs/api/chat)
+| `top_k` | Only sample from the top K options for each subsequent token. | 10 | [reference](https://llama.developer.meta.com/docs/api/chat)
 
 
 ## Troubleshooting

From 9714c0f8246e2be487f82134aa03ae5c8e9ea2bb Mon Sep 17 00:00:00 2001
From: Xi Yan <xiyan@meta.com>
Date: Fri, 16 May 2025 09:48:50 -0700
Subject: [PATCH 4/4] doc

---
 docs/user-guide/concepts/model-providers/llamaapi.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/user-guide/concepts/model-providers/llamaapi.md b/docs/user-guide/concepts/model-providers/llamaapi.md
index dbe0390b..c07b68e4 100644
--- a/docs/user-guide/concepts/model-providers/llamaapi.md
+++ b/docs/user-guide/concepts/model-providers/llamaapi.md
@@ -51,10 +51,10 @@ The `model_config` configures the underlying model selected for inference. The s
 |------------|-------------|---------|---------|
 | `model_id` | ID of a model to use | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/)
 | `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) |  `1`  | [reference](https://llama.developer.meta.com/docs/api/chat)
-| `temperature` | Controls randomness of the response by setting a temperature. | 0.7 | [reference](https://llama.developer.meta.com/docs/api/chat)
-| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 | [reference](https://llama.developer.meta.com/docs/api/chat)
-| `max_completion_tokens` | The maximum number of tokens to generate.  | 4096 | [reference](https://llama.developer.meta.com/docs/api/chat)
-| `top_k` | Only sample from the top K options for each subsequent token. | 10 | [reference](https://llama.developer.meta.com/docs/api/chat)
+| `temperature` | Controls randomness of the response by setting a temperature. | `0.7` | [reference](https://llama.developer.meta.com/docs/api/chat)
+| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | `0.9` | [reference](https://llama.developer.meta.com/docs/api/chat)
+| `max_completion_tokens` | The maximum number of tokens to generate.  | `4096` | [reference](https://llama.developer.meta.com/docs/api/chat)
+| `top_k` | Only sample from the top K options for each subsequent token. | `10` | [reference](https://llama.developer.meta.com/docs/api/chat)
 
 
 ## Troubleshooting