https://github.com/bangkit-pukulrata/machine-learning/tree/main/model
https://github.com/tantowjy/news-classification/blob/main/website/main.py
Dataset LLM: https://www.kaggle.com/datasets/iqbalmaulana/indonesian-news-dataset
Dataset NER: https://github.com/yohanesgultom/nlp-experiments/blob/master/data/ner/training_data.txt
TALAS adalah sistem berbasis API untuk menganalisis berita menggunakan model pembelajaran mesin, termasuk analisis bias, deteksi hoaks, deteksi ideologi, pengelompokan, dan entitas bernama. API ini dibangun dengan layanan Google Cloud Platform (GCP) menggunakan App Engine untuk komputasi, Cloud SQL (MySQL) untuk penyimpanan data pengguna, dan model pembelajaran mesin (supervised & unsupervised learning).
- URL:
/bias - Method: POST
- Description: Processes text to determine the bias of a news article.
- Request:
{ "content": "string" // Content of the news article } - Response:
{ "bias": 0 // Not bias or Bias (0 or 1) }
- URL:
/hoax - Method: POST
- Description: Processes text to determine if the article contains hoaxes.
- Request:
{ "content": "string" // Content of the news article } - Response:
{ "hoax": "float" // Hoax probability (0 to 1) }
- URL:
/ideology - Method: POST
- Description: Processes text to determine the ideology of a news article.
- Request:
{ "content": "string" // Content of the news article } - Response:
{ "ideology": 0 // 0 or 1, "liberal" or "conservative" }
- URL:
/cluster - Method: POST
- Description: Mengelompokkan teks ke dalam cluster tertentu berdasarkan isinya.
- Request:
{ "content": "string" // Isi artikel berita } - Response:
{ "cluster": 3 // Cluster artikel (0-7) }
- URL:
/modeCluster - Method: POST
- Description: Mencari cluster mayoritas dari kumpulan artikel berita.
- Request:
[ { "title": "string", // Judul artikel "content": "string", // Isi artikel "embedding": numpy array }, { "title": "string", // Judul artikel "content": "string", // Isi artikel }, ] - Response:
{ "modeCluster": 2 // Cluster yang paling umum }
- URL:
/embedding - Method: POST
- Description: Menghasilkan embedding untuk teks yang diberikan.
- Request:
[ { "title": "string", // Judul artikel "content": "string" // Isi artikel } ] - Response:
{ "embedding": [[0.1, 0.2]] // Daftar embedding }
- URL:
/title - Method: POST
- Description: Menghasilkan judul dari kumpulan artikel berita.
- Request:
[ { "title": "string", // Judul artikel "content": "string", // Isi artikel "embedding": numpy array }, { "title": "string", // Judul artikel "content": "string", // Isi artikel } ] - Response:
{ "title": "Generated Title" // Judul yang dihasilkan }
- URL:
/summary - Method: POST
- Description: Membuat dua ringkasan (liberal dan konservatif) dari kumpulan artikel berita.
- Request:
[ { "title": "string", // Judul artikel "content": "string", // Isi artikel "embedding": numpy array }, { "title": "string", // Judul artikel "content": "string", // Isi artikel } ] - Response:
{ "summary_liberalism": "string", // Ringkasan liberal "summary_conservative": "string" // Ringkasan konservatif }
- URL:
/analyze - Method: POST
- Description: Menghasilkan analisis perbandingan perspektif liberal dan konservatif.
- Request:
[ { "title": "string", // Judul artikel "content": "string", // Isi artikel "embedding": numpy array }, { "title": "string", // Judul artikel "content": "string", // Isi artikel } ] - Response:
{ "analyze": "string" // Analisis dari dua perspektif berbeda }
- URL:
/cleaned - Method: POST
- Description: Membersihkan teks berita dengan menghapus stopwords dan melakukan stemming.
- Request:
{ "content": "string" // atau ["string", "string"] untuk multiple teks } - Response:
{ "cleaned": "string" // atau ["string", "string"] jika input adalah array }
- URL:
/separate - Method: POST
- Description: Memisahkan artikel berdasarkan kesamaan konten menggunakan similaritas embedding.
- Request:
[ { "title": "string", "content": "string", "embedding": numpy array }, { "title": "string", "content": "string", "embedding": numpy array }, { "title": "string", "content": "string", "embedding": numpy array }, { "title": "string", "content": "string", "embedding": numpy array } ] - Response:
{ "separate": [0, 1, 0, 1] // Berita pada indeks 0 dan 2 mirip, dan diberi kode kelompok "0" }
- URL:
/process-all - Method: POST
- Description: Process input text articles to group, generate titles, clusters/categories, summaries, and bias analysis for each group
- Request | If past already embedded article, please pass "embedding" too.
[ { "title": "string", "content": "string", }, { "title": "string", "content": "string", } ] - Response | Warning: Does not return embedding of each news content. If used on existing already embedded articles, please pass the embedding too.
[ { "title": "Generated Group Title", "modeCluster": "Cluster/Category Name", "summary_liberalism": "Liberal perspective summary", "summary_conservative": "Conservative perspective summary", "analyze": "Bias and content analysis details" } ]
- URL:
/antipode - Method: POST
- Description: Menemukan artikel dengan sudut pandang yang berlawanan dari artikel yang diberikan.
- Request | Pass embedding if available.:
{ "article": { "title": "string", "content": "string", }, "df": [ { "title": "string", "content": "string" } ] } - Response:
["Judul Artikel 1", "Judul Artikel 2"] // Judul artikel dengan sudut pandang berlawanan
- URL:
/ner - Method: POST
- Description: Mendeteksi entitas bernama dalam teks menggunakan model NER.
- Request:
[ { "content": "string" // Teks yang akan dianalisis } ] - Response:
[ [ {"word": "entity", "tag": "B-PER"} ] ] // Daftar entitas yang terdeteksi untuk setiap teks
- URL:
/top_keywords - Method: POST
- Description: Menemukan kata kunci yang paling sering muncul dari beberapa artikel (kata kunci dideteksi dari NER)
- Request:
[ { "keyword": ["string", "string"] } ] - Response:
[ ["keyword1", 10], ["keyword2", 7] ] // Pasangan kata kunci dan jumlah kemunculan
- URL:
/get-clusters - Method: GET
- Description: Returns the mapping of cluster IDs to human-readable category names.
- Response:
{ "success": true, "clusters": { "0": "Korupsi", "1": "Pemerintahan", "2": "Kejahatan", "3": "Transportasi", "4": "Bisnis", "5": "Agama", "6": "Finance", "7": "Politik" } }
- URL:
/get-today-articles - Method: GET
- Description: Retrieves all articles published today.
- Query Parameters:
verbose: If 'true', returns complete article data including content and embeddings (default: false)
- Response:
{ "success": true, "data": [ { "id": 123, "title": "Article Title", "url": "https://news-source.com/article", "source": "News Source", "image": "https://image-url.com/image.jpg", "date": "2025-01-05", "bias": 0.25, "hoax": 0.15, "ideology": 0.75, "title_index": 45 } ], "count": 1 }
- URL:
/get-today-source-counts - Method: GET
- Description: Returns a count of today's articles grouped by news source.
- Response:
{ "success": true, "data": [ { "source": "CNN Indonesia", "count": 12 }, { "source": "Kompas", "count": 8 } ], "count": 2 }
- URL:
/get-today-titles - Method: GET
- Description: Returns all title entries created today.
- Response:
{ "success": true, "data": [ { "title_index": 45, "title": "Group Title", "image": "https://image-url.com/image.jpg", "date": "2025-01-05", "cluster": 2, "all_summary": "Summary of all articles in this group", "analysis": "Analysis of the topic from different perspectives" } ], "count": 1 }
- URL:
/get-title-groups - Method: GET
- Description: Returns all article groups created today with their member articles.
- Response:
{ "success": true, "data": { "45": [ { "id": 123, "title": "First Article in Group", "source": "CNN Indonesia" }, { "id": 124, "title": "Second Article in Group", "source": "Kompas" } ] }, "count": 1 }
- URL:
/users - Method: GET
- Description: Fetches a list of MySQL users.
- Response:
[ { "user": "string", "host": "string" } ]
- URL:
/news - Method: GET
- Description: Fetches a list of news articles from the database.
- Response:
{ "success": true, "data": [ { "id": "integer", "title": "string", "source": "string", "url": "string", "image": "string", "content": "string", "embedding": "string", "cleaned": "string", "title_index": "integer", "cluster": "integer", "bias": "integer", "hoax": "float", "ideology": "integer" } ] }
- URL:
/test-connection - Method: GET
- Description: Tests the connection to the database and retrieves the list of tables.
- Response:
{ "success": true, "tables": [ {"Tables_in_news": "string"} ] }
- URL:
/news_page - Method: GET
- Description: Fetches news articles with optional date filtering and renders them in an HTML page.
- Query Parameters:
start_date: Start date for filtering (optional).end_date: End date for filtering (optional).
- Response: Renders an HTML page with news articles.
- URL:
/news_article - Method: GET
- Description: Fetches details of a specific news article and renders it in an HTML page.
- Query Parameters:
title_index: The index of the article to fetch.
- Response: Renders an HTML page with the article details.
- URL:
/insert_news_page - Method: GET
- Description: Renders a page for inserting news articles.
- Response: Renders an HTML page for inserting news.
- URL:
/insert-title - Method: POST
- Description: Inserts a new title into the database.
- Request:
{ "title": "string", "cluster": "string", "image": "string", "date": "string", "summary_liberalism": "string", "summary_conservative": "string", "analysis": "string" } - Response:
{ "success": true, "message": "Article inserted successfully", "title": { "title": "string", "cluster": "string", "image": "string", "date": "string", "title_index": "integer" } }
- URL:
/insert-article - Method: POST
- Description: Inserts a new article into the database.
- Request:
{ "title": "string", "source": "string", "url": "string", "image": "string", "date": "string", "content": "string" } - Response:
{ "success": true, "message": "Article inserted successfully", "article": { "id": "integer", "title": "string", "source": "string", "url": "string", "image": "string", "date": "string" } }
- URL:
/run-crawlers - Method: POST
- Description: Runs web crawlers to collect news articles from various sources and stores them in the database.
- Request:
{ "pantai": "True" // Default false (run only antarapantai.py. else run the rest without pantai.) } - Response:
{ "success": true, "message": "Crawlers executed and data inserted successfully", "total_results": 25, "inserted_count": 22 }
- URL:
/update-articles - Method: GET
- Description: Processes articles with null embeddings by generating embeddings, cluster assignments, bias, hoax, and ideology classifications.
- Response:
{ "success": true, "message": "Successfully processed 15 articles", "total_articles": 15 }
- URL:
/group-articles - Method: GET, POST
- Description: Groups articles with NULL title_index by using the /separate endpoint to identify similar articles.
- Response:
{ "success": true, "message": "Successfully grouped 30 articles into 8 clusters", "articles_count": 30, "clusters_count": 8 }
- URL:
/process-articles - Method: GET, POST
- Description: Processes article groups by generating titles, summaries, analysis, and setting images for each group in the title table.
- Response:
{ "success": true, "message": "Successfully processed 8 article groups", "total_groups": 10, "processed_groups": 8 }
- URL:
/count-side - Method: GET
- Description: Counts the number of articles categorized as liberal, conservative, or neutral for a given title index.
- Query Parameters:
title_index: The index of the title to fetch articles for.
- Response:
{ "success": true, "counts": { "liberal": 10, "conservative": 5, "neutral": 3 }, "total": 18 }
- URL:
/top-news - Method: GET
- Description: Fetches the top news articles based on the number of articles in each group for the current day.
- Query Parameters:
limit: The maximum number of top news groups to fetch (default is 5).
- Response:
{ "success": true, "data": [ { "title_index": 1, "title": "Top News Title", "image": "image_url", "all_summary": "Summary of the news", "article_count": 10, "counts": { "liberal": 5, "conservative": 3, "neutral": 2 } }, { "title_index": 2, "title": "Another Top News Title", "image": "image_url", "all_summary": "Summary of the news", "article_count": 8, "counts": { "liberal": 4, "conservative": 2, "neutral": 2 } } ] }
- URL:
/get-cluster-news - Method: GET
- Description: Fetches news articles belonging to a specific cluster with detailed information.
- Query Parameters:
cluster: The cluster ID to fetch news for.
- Response:
{ "success": true, "data": [ { "title_index": 1, "title": "Article Title", "date": "2025-01-04", "all_summary": "Summary of the article content", "image": "image_url" }, { "title_index": 2, "title": "Another Article Title", "date": "2025-01-04", "all_summary": "Summary of another article content", "image": "another_image_url" } ], "total": 2 }
- URL:
/get-news - Method: GET
- Description: Fetches the latest news articles for the current day with their title, image, date, title_index, cluster, and political distribution counts.
- Response:
{ "success": true, "data": [ { "title": "News Article Title", "image": "image_url", "date": "2025-01-05", "all_summary": "News summary", "title_index": 123, "cluster": 4, "counts": { "liberal": 5, "conservative": 3, "neutral": 2 } } ], "total": 1 }
- URL:
/get-news-detail - Method: GET
- Description: Fetches detailed information about a specific news article group including its title, cluster, image, date, summary, analysis, and all related articles.
- Query Parameters:
title_index: The index of the news title to fetch details for.
- Response:
{ "success": true, "title": "News Article Title", "cluster": 4, "image": "image_url", "date": "2025-01-05", "all_summary": "Comprehensive summary of the news topic", "analysis": "Detailed analysis of the news from different perspectives", "articles": [ { "title": "Related Article Title", "url": "article_url", "source": "News Source", "date": "2025-01-05", "bias": 0.42, "hoax": 0.12, "ideology": 0.65 } ] }
- URL:
/search-title - Method: GET
- Description: Searches for news articles whose titles contain the specified query string.
- Query Parameters:
query: The search term to look for in news titles.
- Response:
{ "success": true, "data": [ { "title_index": 123, "title": "News Article Title Containing Search Term", "date": "2025-01-05", "all_summary": "Summary of the article content", "image": "image_url" } ], "total": 1 }