Skip to content

Commit 1dbc655

Browse files
Merge pull request #1066 from JohnSnowLabs/chore/final_website_updates
update the README.md file
2 parents 2ee2c22 + 7f0f46d commit 1dbc655

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,9 @@ You can check out the following LangTest articles:
105105
| [**Testing the Robustness of LSTM-Based Sentiment Analysis Models**](https://medium.com/john-snow-labs/testing-the-robustness-of-lstm-based-sentiment-analysis-models-67ed84e42997) | Explore the robustness of custom models with LangTest Insights.|
106106
| [**LangTest Insights: A Deep Dive into LLM Robustness on OpenBookQA**](https://medium.com/john-snow-labs/langtest-insights-a-deep-dive-into-llm-robustness-on-openbookqa-ab0ddcbd2ab1) | Explore the robustness of Language Models (LLMs) on the OpenBookQA dataset with LangTest Insights.|
107107
| [**LangTest: A Secret Weapon for Improving the Robustness of Your Transformers Language Models**](https://medium.com/john-snow-labs/langtest-a-secret-weapon-for-improving-the-robustness-of-your-transformers-language-models-9693d64256cc) | Explore the robustness of Transformers Language Models with LangTest Insights.|
108-
108+
| [**Mastering Model Evaluation: Introducing the Comprehensive Ranking & Leaderboard System in LangTest**](https://medium.com/john-snow-labs/mastering-model-evaluation-introducing-the-comprehensive-ranking-leaderboard-system-in-langtest-5242927754bb) | The Model Ranking & Leaderboard system by John Snow Labs' LangTest offers a systematic approach to evaluating AI models with comprehensive ranking, historical comparisons, and dataset-specific insights, empowering researchers and data scientists to make data-driven decisions on model performance. |
109+
| [**Evaluating Long-Form Responses with Prometheus-Eval and Langtest**](https://medium.com/john-snow-labs/evaluating-long-form-responses-with-prometheus-eval-and-langtest-a8279355362e) | Prometheus-Eval and LangTest unite to offer an open-source, reliable, and cost-effective solution for evaluating long-form responses, combining Prometheus's GPT-4-level performance and LangTest's robust testing framework to provide detailed, interpretable feedback and high accuracy in assessments. |
110+
| [**Ensuring Precision of LLMs in Medical Domain: The Challenge of Drug Name Swapping**](https://medium.com/john-snow-labs/ensuring-precision-of-llms-in-medical-domain-the-challenge-of-drug-name-swapping-d7f4c83d55fd) | Accurate drug name identification is crucial for patient safety. Testing GPT-4o with LangTest's **_drug_generic_to_brand_** conversion test revealed potential errors in predicting drug names when brand names are replaced by ingredients, highlighting the need for ongoing refinement and rigorous testing to ensure medical LLM accuracy and reliability. |
109111

110112

111113

docs/pages/tutorials/miscellaneous_notebooks/miscellaneous_notebooks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,6 @@ The following table gives an overview of the different tutorial notebooks. In th
4141
| **Generic API-Based Model**: In this section, we discussed how to test API-based models hosted using Ollama, vLLM, and other tools. | Web |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/Generic_API-Based_Model_Testing_Demo.ipynb) |
4242
| **Data Augmenter**: In this Notebook, we can allows for streamlined and harness-free data augmentation, making it simpler to enhance your datasets and improve model robustness. | - |NER | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Data_Augmenter_Notebook.ipynb) |
4343
| **Multi-Dataset Prompt Configs**: In this Notebook, we discussed about optimized prompt handling for multiple datasets, allowing users to add custom prompts for each dataset, enabling seamless integration and efficient testing. | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/MultiPrompt_MultiDataset.ipynb) |
44-
| **Multi-Model, Multi-Dataset**: In this Notebook, we discussed about testing on multiple models with multiple datasets, allowing users to allows for comprehensive comparisons and performance assessments in a streamlined manner. | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Multi_Model_Multi_Dataset.ipynb.ipynb) |
44+
| **Multi-Model, Multi-Dataset**: In this Notebook, we discussed about testing on multiple models with multiple datasets, allowing users to allows for comprehensive comparisons and performance assessments in a streamlined manner. | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Multi_Model_Multi_Dataset.ipynb) |
4545
| **Evaluation_with_Prometheus_Eval**: In this Notebook, we disscussed about integrating the Prometheus model to langtest brings enhanced evaluation capabilities, providing more detailed and insightful metrics for model performance assessment. | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Evaluation_with_Prometheus_Eval.ipynb) |
4646
| **Misuse_Test_with_Prometheus_evaluation**: In this Notebook, we discussed about new safety testing features to identify and mitigate potential misuse and safety issues in your models | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Misuse_Test_with_Prometheus_evaluation.ipynb) |

0 commit comments

Comments
 (0)