From ad5b0adbab2d03ddbf0e9bbd0f759abc24d3cb73 Mon Sep 17 00:00:00 2001 From: Xi Yan Date: Fri, 16 May 2025 09:01:59 -0700 Subject: [PATCH 1/4] docs --- docs/api-reference/models.md | 3 + .../concepts/model-providers/llamaapi.md | 70 +++++++++++++++++++ 2 files changed, 73 insertions(+) create mode 100644 docs/user-guide/concepts/model-providers/llamaapi.md diff --git a/docs/api-reference/models.md b/docs/api-reference/models.md index 5f07e08a..24b6cf29 100644 --- a/docs/api-reference/models.md +++ b/docs/api-reference/models.md @@ -11,3 +11,6 @@ ::: strands.models.ollama options: heading_level: 2 +::: strands.models.llamaapi + options: + heading_level: 2 diff --git a/docs/user-guide/concepts/model-providers/llamaapi.md b/docs/user-guide/concepts/model-providers/llamaapi.md new file mode 100644 index 00000000..87033824 --- /dev/null +++ b/docs/user-guide/concepts/model-providers/llamaapi.md @@ -0,0 +1,70 @@ +# Llama API + +[Llama API](https://llama.developer.meta.com/) is a Meta-hosted API service that helps you integrate Llama models into your applications quickly and efficiently. + +Llama API provides access to Llama models through a simple API interface, with inference provided by Meta, so you can focus on building AI-powered solutions without managing your own inference infrastructure. + +With Llama API, you get access to state-of-the-art AI capabilities through a developer-friendly interface designed for simplicity and performance. + +## Installation + +Llama API is configured as an optional dependency in Strands Agents. To install, run: + +```bash +pip install strands-agents[llamaapi] +``` + +## Usage + +After installing `llamaapi`, you can import and initialize Strands Agents' Llama API provider as follows: + +```python +from strands import Agent +from strands.models.llamaapi import LlamaAPIModel +from strands_tools import calculator + +model = LlamaAPIModel( + client_args={ + "api_key": "", + }, + # **model_config + model_id="Llama-4-Maverick-17B-128E-Instruct-FP8", +) + +agent = Agent(model=model, tools=[calculator]) +response = agent("What is 2+2") +print(response) +``` + +## Configuration + +### Client Configuration + +The `client_args` configure the underlying LlamaAPI client. For a complete list of available arguments, please refer to the LlamaAPI [docs](https://llama.developer.meta.com/docs/). + + +### Model Configuration + +The `model_config` configures the underlying model selected for inference. The supported configurations are: + +| Parameter | Description | Example | Options | +|------------|-------------|---------|---------| +| `model_id` | ID of a model to use | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) +| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) +| `temperature` | Controls randomness of the response by setting a temperature. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) +| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) +| `max_completion_tokens` | The maximum number of tokens to generate. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) +| `top_k` | Only sample from the top K options for each subsequent token. + | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) + + +## Troubleshooting + +### Module Not Found + +If you encounter the error `ModuleNotFoundError: No module named 'llamaapi'`, this means you haven't installed the `llamaapi` dependency in your environment. To fix, run `pip install strands-agents[llamaapi]`. + +## References + +- [API](../../../api-reference/models.md) +- [LlamaAPI](https://llama.developer.meta.com/docs/) From f496fed0d6296ed78940fa8a11d2adedfb2b62bc Mon Sep 17 00:00:00 2001 From: Xi Yan Date: Fri, 16 May 2025 09:38:53 -0700 Subject: [PATCH 2/4] fix docs --- docs/user-guide/concepts/model-providers/llamaapi.md | 11 +++++------ mkdocs.yml | 5 +++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/user-guide/concepts/model-providers/llamaapi.md b/docs/user-guide/concepts/model-providers/llamaapi.md index 87033824..debcd64a 100644 --- a/docs/user-guide/concepts/model-providers/llamaapi.md +++ b/docs/user-guide/concepts/model-providers/llamaapi.md @@ -50,12 +50,11 @@ The `model_config` configures the underlying model selected for inference. The s | Parameter | Description | Example | Options | |------------|-------------|---------|---------| | `model_id` | ID of a model to use | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) -| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) -| `temperature` | Controls randomness of the response by setting a temperature. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) -| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) -| `max_completion_tokens` | The maximum number of tokens to generate. | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) -| `top_k` | Only sample from the top K options for each subsequent token. - | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) +| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | `1` | [reference](https://llama.developer.meta.com/docs/) +| `temperature` | Controls randomness of the response by setting a temperature. | 0.7 | [reference](https://llama.developer.meta.com/docs/) +| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 | [reference](https://llama.developer.meta.com/docs/) +| `max_completion_tokens` | The maximum number of tokens to generate. | 4096 | [reference](https://llama.developer.meta.com/docs/) +| `top_k` | Only sample from the top K options for each subsequent token. | 10 | [reference](https://llama.developer.meta.com/docs/) ## Troubleshooting diff --git a/mkdocs.yml b/mkdocs.yml index 84a663c7..f77221fe 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -75,6 +75,7 @@ nav: - Amazon Bedrock: user-guide/concepts/model-providers/amazon-bedrock.md - LiteLLM: user-guide/concepts/model-providers/litellm.md - Ollama: user-guide/concepts/model-providers/ollama.md + - LlamaAPI: user-guide/concepts/model-providers/llamaapi.md - Custom Providers: user-guide/concepts/model-providers/custom_model_provider.md - Streaming: - Async Iterators: user-guide/concepts/streaming/async-iterators.md @@ -105,7 +106,7 @@ nav: - Weather Forecaster: examples/python/weather_forecaster.md - File Operations: examples/python/file_operations.md - Agents Workflows: examples/python/agents_workflows.md - - Knowledge-Base Workflow: examples/python/knowledge_base_agent.md + - Knowledge-Base Workflow: examples/python/knowledge_base_agent.md - Multi Agents: examples/python/multi_agent_example/multi_agent_example.md - Meta Tooling: examples/python/meta_tooling.md - MCP: examples/python/mcp_calculator.md @@ -167,4 +168,4 @@ validation: not_found: warn anchors: warn absolute_links: warn - unrecognized_links: warn \ No newline at end of file + unrecognized_links: warn From a4fbb846ae169f71505fc9e477342b3e34a77523 Mon Sep 17 00:00:00 2001 From: Xi Yan Date: Fri, 16 May 2025 09:46:45 -0700 Subject: [PATCH 3/4] links --- docs/user-guide/concepts/model-providers/llamaapi.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/user-guide/concepts/model-providers/llamaapi.md b/docs/user-guide/concepts/model-providers/llamaapi.md index debcd64a..dbe0390b 100644 --- a/docs/user-guide/concepts/model-providers/llamaapi.md +++ b/docs/user-guide/concepts/model-providers/llamaapi.md @@ -50,11 +50,11 @@ The `model_config` configures the underlying model selected for inference. The s | Parameter | Description | Example | Options | |------------|-------------|---------|---------| | `model_id` | ID of a model to use | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) -| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | `1` | [reference](https://llama.developer.meta.com/docs/) -| `temperature` | Controls randomness of the response by setting a temperature. | 0.7 | [reference](https://llama.developer.meta.com/docs/) -| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 | [reference](https://llama.developer.meta.com/docs/) -| `max_completion_tokens` | The maximum number of tokens to generate. | 4096 | [reference](https://llama.developer.meta.com/docs/) -| `top_k` | Only sample from the top K options for each subsequent token. | 10 | [reference](https://llama.developer.meta.com/docs/) +| `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | `1` | [reference](https://llama.developer.meta.com/docs/api/chat) +| `temperature` | Controls randomness of the response by setting a temperature. | 0.7 | [reference](https://llama.developer.meta.com/docs/api/chat) +| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 | [reference](https://llama.developer.meta.com/docs/api/chat) +| `max_completion_tokens` | The maximum number of tokens to generate. | 4096 | [reference](https://llama.developer.meta.com/docs/api/chat) +| `top_k` | Only sample from the top K options for each subsequent token. | 10 | [reference](https://llama.developer.meta.com/docs/api/chat) ## Troubleshooting From 9714c0f8246e2be487f82134aa03ae5c8e9ea2bb Mon Sep 17 00:00:00 2001 From: Xi Yan Date: Fri, 16 May 2025 09:48:50 -0700 Subject: [PATCH 4/4] doc --- docs/user-guide/concepts/model-providers/llamaapi.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/user-guide/concepts/model-providers/llamaapi.md b/docs/user-guide/concepts/model-providers/llamaapi.md index dbe0390b..c07b68e4 100644 --- a/docs/user-guide/concepts/model-providers/llamaapi.md +++ b/docs/user-guide/concepts/model-providers/llamaapi.md @@ -51,10 +51,10 @@ The `model_config` configures the underlying model selected for inference. The s |------------|-------------|---------|---------| | `model_id` | ID of a model to use | `Llama-4-Maverick-17B-128E-Instruct-FP8` | [reference](https://llama.developer.meta.com/docs/) | `repetition_penalty` | Controls the likelyhood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | `1` | [reference](https://llama.developer.meta.com/docs/api/chat) -| `temperature` | Controls randomness of the response by setting a temperature. | 0.7 | [reference](https://llama.developer.meta.com/docs/api/chat) -| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 | [reference](https://llama.developer.meta.com/docs/api/chat) -| `max_completion_tokens` | The maximum number of tokens to generate. | 4096 | [reference](https://llama.developer.meta.com/docs/api/chat) -| `top_k` | Only sample from the top K options for each subsequent token. | 10 | [reference](https://llama.developer.meta.com/docs/api/chat) +| `temperature` | Controls randomness of the response by setting a temperature. | `0.7` | [reference](https://llama.developer.meta.com/docs/api/chat) +| `top_p` | Controls diversity of the response by setting a probability threshold when choosing the next token. | `0.9` | [reference](https://llama.developer.meta.com/docs/api/chat) +| `max_completion_tokens` | The maximum number of tokens to generate. | `4096` | [reference](https://llama.developer.meta.com/docs/api/chat) +| `top_k` | Only sample from the top K options for each subsequent token. | `10` | [reference](https://llama.developer.meta.com/docs/api/chat) ## Troubleshooting