BigQuery

BQ is fully managed data warehouse
Takes care of underlying infrastructure
Different from a data lake - which is raw, unstructed data
Data warehouse contains structured and organized data
Data sources feed into a data laek, which are processed into your data warehouse for analysis and reporting
BQ has two main features
- Fast SQL Engine for analytics (Analytics)
- Storage layer for database (Storag)
Can also integrate with Vertex AI and other ML workflows

Interactive query service
Supports:
- Web UI
- bq
- REST API
Connectors to other services such as Dataproc
Also run query jobs on data in other locations
- CSV Cloud Storage
- Data in Google Sheets
  - Note: BigQuery is most efficient when working with its own storage service
Controlling the amount of data processed for costs
- Only select columns and data you actually want to process and return
- Focus on critical rows and fields

Divided into project, dataset and tables
Use multiple datasets to separate tables for different analytical domains, use project-level scoping to isolate datasets from each other
Every table has a schema
- Parquet, Firestore export or Datastore export are self-describing
- However you can enter schema manually through JSON file
BQ is a columnar store, designed to process columns, not rows
BQ stores data in distributed, sharded format with auto-optimization, balancing, etc.
You can also partition and cluster your table based on your access patterns to control costs

Last updated 1 year ago