TALAS API Documentation

Based on:

https://github.com/bangkit-pukulrata/machine-learning/tree/main/model
https://github.com/tantowjy/news-classification/blob/main/website/main.py

Dataset

Dataset LLM: https://www.kaggle.com/datasets/iqbalmaulana/indonesian-news-dataset
Dataset NER: https://github.com/yohanesgultom/nlp-experiments/blob/master/data/ner/training_data.txt

TALAS API Documentation

Overview

TALAS adalah sistem berbasis API untuk menganalisis berita menggunakan model pembelajaran mesin, termasuk analisis bias, deteksi hoaks, deteksi ideologi, pengelompokan, dan entitas bernama. API ini dibangun dengan layanan Google Cloud Platform (GCP) menggunakan App Engine untuk komputasi, Cloud SQL (MySQL) untuk penyimpanan data pengguna, dan model pembelajaran mesin (supervised & unsupervised learning).

Routes.py Endpoint Production\machine-learning\app\routes.py

1. Bias Detection Endpoint

URL: /bias
Method: POST
Description: Processes text to determine the bias of a news article.

Request:

{
    "content": "string" // Content of the news article
}

Response:

{
    "bias": 0 // Not bias or Bias (0 or 1)
}

2. Hoax Detection Endpoint

URL: /hoax
Method: POST
Description: Processes text to determine if the article contains hoaxes.

Request:

{
    "content": "string" // Content of the news article
}

Response:

{
    "hoax": "float" // Hoax probability (0 to 1)
}

3. Ideology Detection Endpoint

URL: /ideology
Method: POST
Description: Processes text to determine the ideology of a news article.

Request:

{
    "content": "string" // Content of the news article
}

Response:

{
    "ideology": 0 // 0 or 1, "liberal" or "conservative"
}

Unsupervised Learning Models

1. Cluster Endpoint

URL: /cluster
Method: POST
Description: Mengelompokkan teks ke dalam cluster tertentu berdasarkan isinya.

Request:

{
    "content": "string" // Isi artikel berita
}

Response:

{
    "cluster": 3 // Cluster artikel (0-7)
}

2. Generate Mode Cluster

URL: /modeCluster
Method: POST
Description: Mencari cluster mayoritas dari kumpulan artikel berita.

Request:

[
    {
        "title": "string", // Judul artikel
        "content": "string", // Isi artikel
        "embedding": numpy array 
    },
    {
        "title": "string", // Judul artikel
        "content": "string", // Isi artikel
    },
]

Response:

{
    "modeCluster": 2 // Cluster yang paling umum
}

Large Language Model (LLM)

1. Generate Embedding Endpoint

URL: /embedding
Method: POST
Description: Menghasilkan embedding untuk teks yang diberikan.

Request:

[
    {
        "title": "string", // Judul artikel
        "content": "string" // Isi artikel
    }
]

Response:

{
    "embedding": [[0.1, 0.2]] // Daftar embedding
}

2. Generate Title Endpoint

URL: /title
Method: POST
Description: Menghasilkan judul dari kumpulan artikel berita.

Request:

[
    {
        "title": "string", // Judul artikel
        "content": "string", // Isi artikel
        "embedding": numpy array 
    },
    {
        "title": "string", // Judul artikel
        "content": "string", // Isi artikel
    }
]

Response:

{
    "title": "Generated Title" // Judul yang dihasilkan
}

3. Generate Summary Endpoint

URL: /summary
Method: POST
Description: Membuat dua ringkasan (liberal dan konservatif) dari kumpulan artikel berita.

Request:

[
    {
        "title": "string", // Judul artikel
        "content": "string", // Isi artikel
        "embedding": numpy array 
    },
    {
        "title": "string", // Judul artikel
        "content": "string", // Isi artikel
    }
]

Response:

{
    "summary_liberalism": "string", // Ringkasan liberal
    "summary_conservative": "string" // Ringkasan konservatif
}

4. Generate Analysis Endpoint

URL: /analyze
Method: POST
Description: Menghasilkan analisis perbandingan perspektif liberal dan konservatif.

Request:

[
    {
        "title": "string", // Judul artikel
        "content": "string", // Isi artikel
        "embedding": numpy array 
    },
    {
        "title": "string", // Judul artikel
        "content": "string", // Isi artikel
    }
]

Response:

{
    "analyze": "string" // Analisis dari dua perspektif berbeda
}

5. Clean Text Endpoint

URL: /cleaned
Method: POST
Description: Membersihkan teks berita dengan menghapus stopwords dan melakukan stemming.

Request:

{
    "content": "string" // atau ["string", "string"] untuk multiple teks
}

Response:

{
    "cleaned": "string" // atau ["string", "string"] jika input adalah array
}

6. Separate Articles Endpoint

URL: /separate
Method: POST
Description: Memisahkan artikel berdasarkan kesamaan konten menggunakan similaritas embedding.

Request:

[
    {
        "title": "string",
        "content": "string",
        "embedding": numpy array 
    },
    {
        "title": "string",
        "content": "string",
        "embedding": numpy array 
    },
    {
        "title": "string",
        "content": "string",
        "embedding": numpy array 
    },
    {
        "title": "string",
        "content": "string",
        "embedding": numpy array 
    }
]

Response:

{
    "separate": [0, 1, 0, 1] // Berita pada indeks 0 dan 2 mirip, dan diberi kode kelompok "0"
}

7. Process All Articles

URL: /process-all
Method: POST
Description: Process input text articles to group, generate titles, clusters/categories, summaries, and bias analysis for each group

Request | If past already embedded article, please pass "embedding" too.

[
  {
    "title": "string",
    "content": "string",
  },
  {
    "title": "string",
    "content": "string",
  }
]

Response | Warning: Does not return embedding of each news content. If used on existing already embedded articles, please pass the embedding too.

[
  {
    "title": "Generated Group Title",
    "modeCluster": "Cluster/Category Name",
    "summary_liberalism": "Liberal perspective summary",
    "summary_conservative": "Conservative perspective summary",
    "analyze": "Bias and content analysis details"
  }
]

8. Antipode Articles Endpoint

URL: /antipode
Method: POST
Description: Menemukan artikel dengan sudut pandang yang berlawanan dari artikel yang diberikan.

Request | Pass embedding if available.:

{
    "article": {
        "title": "string",
        "content": "string",
    },
    "df": [
        {
            "title": "string",
            "content": "string"
        }
    ]
}

Response:

["Judul Artikel 1", "Judul Artikel 2"] // Judul artikel dengan sudut pandang berlawanan

Named Entity Recognition (NER)

1. NER API Endpoint

URL: /ner
Method: POST
Description: Mendeteksi entitas bernama dalam teks menggunakan model NER.

Request:

[
    {
        "content": "string" // Teks yang akan dianalisis
    }
]

Response:

[
    [
        {"word": "entity", "tag": "B-PER"}
    ]
] // Daftar entitas yang terdeteksi untuk setiap teks

2. Top Keywords Endpoint

URL: /top_keywords
Method: POST
Description: Menemukan kata kunci yang paling sering muncul dari beberapa artikel (kata kunci dideteksi dari NER)

Request:

[
    {
        "keyword": ["string", "string"]
    }
]

Response:

[
    ["keyword1", 10],
    ["keyword2", 7]
] // Pasangan kata kunci dan jumlah kemunculan

Database Endpoints Production\machine-learning\app\db.py

1. Get Clusters

URL: /get-clusters
Method: GET
Description: Returns the mapping of cluster IDs to human-readable category names.

Response:

{
    "success": true,
    "clusters": {
        "0": "Korupsi",
        "1": "Pemerintahan",
        "2": "Kejahatan",
        "3": "Transportasi",
        "4": "Bisnis",
        "5": "Agama", 
        "6": "Finance",
        "7": "Politik"
    }
}

2. Get Today's Articles

URL: /get-today-articles
Method: GET
Description: Retrieves all articles published today.
Query Parameters:
- verbose: If 'true', returns complete article data including content and embeddings (default: false)

Response:

{
    "success": true,
    "data": [
        {
            "id": 123,
            "title": "Article Title",
            "url": "https://news-source.com/article",
            "source": "News Source",
            "image": "https://image-url.com/image.jpg",
            "date": "2025-01-05",
            "bias": 0.25,
            "hoax": 0.15,
            "ideology": 0.75,
            "title_index": 45
        }
    ],
    "count": 1
}

3. Get Today's Source Counts

URL: /get-today-source-counts
Method: GET
Description: Returns a count of today's articles grouped by news source.

Response:

{
    "success": true,
    "data": [
        {
            "source": "CNN Indonesia",
            "count": 12
        },
        {
            "source": "Kompas",
            "count": 8
        }
    ],
    "count": 2
}

4. Get Today's Titles

URL: /get-today-titles
Method: GET
Description: Returns all title entries created today.

Response:

{
    "success": true,
    "data": [
        {
            "title_index": 45,
            "title": "Group Title",
            "image": "https://image-url.com/image.jpg",
            "date": "2025-01-05",
            "cluster": 2,
            "all_summary": "Summary of all articles in this group",
            "analysis": "Analysis of the topic from different perspectives"
        }
    ],
    "count": 1
}

5. Get Title Groups

URL: /get-title-groups
Method: GET
Description: Returns all article groups created today with their member articles.

Response:

{
    "success": true,
    "data": {
        "45": [
            {
                "id": 123,
                "title": "First Article in Group",
                "source": "CNN Indonesia"
            },
            {
                "id": 124,
                "title": "Second Article in Group",
                "source": "Kompas"
            }
        ]
    },
    "count": 1
}

6. Fetch Users (UNAVAILABLE)

URL: /users
Method: GET
Description: Fetches a list of MySQL users.

Response:

[
    {
        "user": "string",
        "host": "string"
    }
]

7. Fetch News (UNAVAILABLE)

URL: /news
Method: GET
Description: Fetches a list of news articles from the database.

Response:

{
    "success": true,
    "data": [
        {
            "id": "integer",
            "title": "string",
            "source": "string",
            "url": "string",
            "image": "string",
            "content": "string",
            "embedding": "string",
            "cleaned": "string",
            "title_index": "integer",
            "cluster": "integer",
            "bias": "integer",
            "hoax": "float",
            "ideology": "integer"
        }
    ]
}

8. Test Database Connection (UNAVAILABLE)

URL: /test-connection
Method: GET
Description: Tests the connection to the database and retrieves the list of tables.

Response:

{
    "success": true,
    "tables": [
        {"Tables_in_news": "string"}
    ]
}

9. News Page

URL: /news_page
Method: GET
Description: Fetches news articles with optional date filtering and renders them in an HTML page.
Query Parameters:
- start_date: Start date for filtering (optional).
- end_date: End date for filtering (optional).
Response: Renders an HTML page with news articles.

10. News Article

URL: /news_article
Method: GET
Description: Fetches details of a specific news article and renders it in an HTML page.
Query Parameters:
- title_index: The index of the article to fetch.
Response: Renders an HTML page with the article details.

11. Insert News Page (UNAVAILABLE)

URL: /insert_news_page
Method: GET
Description: Renders a page for inserting news articles.
Response: Renders an HTML page for inserting news.

12. Insert Title (UNAVAILABLE)

URL: /insert-title
Method: POST
Description: Inserts a new title into the database.

Request:

{
    "title": "string",
    "cluster": "string",
    "image": "string",
    "date": "string",
    "summary_liberalism": "string",
    "summary_conservative": "string",
    "analysis": "string"
}

Response:

{
    "success": true,
    "message": "Article inserted successfully",
    "title": {
        "title": "string",
        "cluster": "string",
        "image": "string",
        "date": "string",
        "title_index": "integer"
    }
}

13. Insert Article (UNAVAILBLE)

URL: /insert-article
Method: POST
Description: Inserts a new article into the database.

Request:

{
    "title": "string",
    "source": "string",
    "url": "string",
    "image": "string",
    "date": "string",
    "content": "string"
}

Response:

{
    "success": true,
    "message": "Article inserted successfully",
    "article": {
        "id": "integer",
        "title": "string",
        "source": "string",
        "url": "string",
        "image": "string",
        "date": "string"
    }
}

14. Run Web Crawlers

URL: /run-crawlers
Method: POST
Description: Runs web crawlers to collect news articles from various sources and stores them in the database.

Request:

{
    "pantai": "True" // Default false (run only antarapantai.py. else run the rest without pantai.)
}

Response:

{
    "success": true,
    "message": "Crawlers executed and data inserted successfully",
    "total_results": 25,
    "inserted_count": 22
}

15. Update Articles

URL: /update-articles
Method: GET
Description: Processes articles with null embeddings by generating embeddings, cluster assignments, bias, hoax, and ideology classifications.

Response:

{
    "success": true,
    "message": "Successfully processed 15 articles",
    "total_articles": 15
}

16. Group Articles

URL: /group-articles
Method: GET, POST
Description: Groups articles with NULL title_index by using the /separate endpoint to identify similar articles.

Response:

{
    "success": true,
    "message": "Successfully grouped 30 articles into 8 clusters",
    "articles_count": 30,
    "clusters_count": 8
}

17. Process Articles

URL: /process-articles
Method: GET, POST
Description: Processes article groups by generating titles, summaries, analysis, and setting images for each group in the title table.

Response:

{
    "success": true,
    "message": "Successfully processed 8 article groups",
    "total_groups": 10,
    "processed_groups": 8
}

18. Count Side

URL: /count-side
Method: GET
Description: Counts the number of articles categorized as liberal, conservative, or neutral for a given title index.
Query Parameters:
- title_index: The index of the title to fetch articles for.

Response:

{
    "success": true,
    "counts": {
        "liberal": 10,
        "conservative": 5,
        "neutral": 3
    },
    "total": 18
}

19. Top News

URL: /top-news
Method: GET
Description: Fetches the top news articles based on the number of articles in each group for the current day.
Query Parameters:
- limit: The maximum number of top news groups to fetch (default is 5).

Response:

{
    "success": true,
    "data": [
        {
            "title_index": 1,
            "title": "Top News Title",
            "image": "image_url",
            "all_summary": "Summary of the news",
            "article_count": 10,
            "counts": {
                "liberal": 5,
                "conservative": 3,
                "neutral": 2
            }
        },
        {
            "title_index": 2,
            "title": "Another Top News Title",
            "image": "image_url",
            "all_summary": "Summary of the news",
            "article_count": 8,
            "counts": {
                "liberal": 4,
                "conservative": 2,
                "neutral": 2
            }
        }
    ]
}

20. Get Cluster News

URL: /get-cluster-news
Method: GET
Description: Fetches news articles belonging to a specific cluster with detailed information.
Query Parameters:
- cluster: The cluster ID to fetch news for.

Response:

{
    "success": true,
    "data": [
        {
            "title_index": 1,
            "title": "Article Title",
            "date": "2025-01-04",
            "all_summary": "Summary of the article content",
            "image": "image_url"
        },
        {
            "title_index": 2,
            "title": "Another Article Title",
            "date": "2025-01-04",
            "all_summary": "Summary of another article content",
            "image": "another_image_url"
        }
    ],
    "total": 2
}

21. Get News

URL: /get-news
Method: GET
Description: Fetches the latest news articles for the current day with their title, image, date, title_index, cluster, and political distribution counts.

Response:

{
    "success": true,
    "data": [
        {
            "title": "News Article Title",
            "image": "image_url",
            "date": "2025-01-05",
            "all_summary": "News summary",
            "title_index": 123,
            "cluster": 4,
            "counts": {
                "liberal": 5,
                "conservative": 3,
                "neutral": 2
            }
        }
    ],
    "total": 1
}

22. Get News Detail

URL: /get-news-detail
Method: GET
Description: Fetches detailed information about a specific news article group including its title, cluster, image, date, summary, analysis, and all related articles.
Query Parameters:
- title_index: The index of the news title to fetch details for.

Response:

{
    "success": true,
    "title": "News Article Title",
    "cluster": 4,
    "image": "image_url",
    "date": "2025-01-05",
    "all_summary": "Comprehensive summary of the news topic",
    "analysis": "Detailed analysis of the news from different perspectives",
    "articles": [
        {
            "title": "Related Article Title",
            "url": "article_url",
            "source": "News Source",
            "date": "2025-01-05",
            "bias": 0.42,
            "hoax": 0.12,
            "ideology": 0.65
        }
    ]
}

23. Search Title

URL: /search-title
Method: GET
Description: Searches for news articles whose titles contain the specified query string.
Query Parameters:
- query: The search term to look for in news titles.

Response:

{
    "success": true,
    "data": [
        {
            "title_index": 123,
            "title": "News Article Title Containing Search Term",
            "date": "2025-01-05",
            "all_summary": "Summary of the article content",
            "image": "image_url"
        }
    ],
    "total": 1
}

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.vscode		.vscode
Development		Development
Production		Production
.gitignore		.gitignore
README.md		README.md

cabcode-id/TALAS

Folders and files

Latest commit

History

Repository files navigation

Based on:

Dataset

TALAS API Documentation

Overview

Routes.py Endpoint Production\machine-learning\app\routes.py

1. Bias Detection Endpoint

2. Hoax Detection Endpoint

3. Ideology Detection Endpoint

Unsupervised Learning Models

1. Cluster Endpoint

2. Generate Mode Cluster

Large Language Model (LLM)

1. Generate Embedding Endpoint

2. Generate Title Endpoint

3. Generate Summary Endpoint

4. Generate Analysis Endpoint

5. Clean Text Endpoint

6. Separate Articles Endpoint

7. Process All Articles

8. Antipode Articles Endpoint

Named Entity Recognition (NER)

1. NER API Endpoint

2. Top Keywords Endpoint

Database Endpoints Production\machine-learning\app\db.py

1. Get Clusters

2. Get Today's Articles

3. Get Today's Source Counts

4. Get Today's Titles

5. Get Title Groups

6. Fetch Users (UNAVAILABLE)

7. Fetch News (UNAVAILABLE)

8. Test Database Connection (UNAVAILABLE)

9. News Page

10. News Article

11. Insert News Page (UNAVAILABLE)

12. Insert Title (UNAVAILABLE)

13. Insert Article (UNAVAILBLE)

14. Run Web Crawlers

15. Update Articles

16. Group Articles

17. Process Articles

18. Count Side

19. Top News

20. Get Cluster News

21. Get News

22. Get News Detail

23. Search Title

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Languages

Packages