In this repository we train a LLM to be an expert on galaxy clusters using a curated set of scientific articles on galaxy clusters.
This app can be found running at https://llm-galaxyclusters.streamlit.app/.
- Create a 
.envfile containingOPENAI_API_KEY. - Run 
streamlit run LLM-GalaxyClusters.py 
This code has two main components:
- The code ingests a curated set of scientific articles on galaxy clusters in the form of PDFs. Then, it creates vector embeddings for these articles and saves them in a 
chromadb.sqlite3database. - The code uses 
LangChainto call a LLM (chatgpt-3.5-turbo) to generate answers based off the users questions. The responses are augmented using the RAG technique that concentrates the LLM's answer based off the ingested PDFs. 
If you are interested in contributing to the code or the base of PDFs, please contact me via [email protected] or leave a GitHub issue.
The list of scientific articles used to train the LLM can be found in LLM-GalaxyClusters.bib.