Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
-
Updated
Aug 14, 2025 - Java
Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.
📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference.
Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
CATA.Search. Blockchain database, cata metadata query
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow
Official repository of SquashQL, the SQL query engine for multi-dimensional and hierarchical analysis that empowers your SQL database
Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data types and format characters) using Java.
Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub
Convenient Dataflow pipelines for transforming data between cloud data sources
Use Remote Functions to tokenize data with DLP in BigQuery using SQL
Replicates any database (CDC events) to Bigquery in real time
Tokenize Japanese text on BigQuery with Kuromoji in Apache Beam/Google Dataflow at scale
Released May 19, 2010