This article contains my notes on Google Cloud Bigtable and is part of my series on Professional Cloud Architect certification.

I plan to take the Professional Cloud Database Engineer certification, so I am starting my training for the PCA with a database deep dive.

I do not work with Bigtable often, but when I do I need to get up to speed quickly. The typical GCP budget for a Bigtable deployment starts at $5K per month. Investing in a couple of hours to refresh is a good idea. These notes minimize my time searching for resources.

Note: this article is a work in progress while I train for the PCA and Database exams.

What is Cloud Bigtable

If you are familiar with Hadoop, Cloud Bigtable is HBase compatible. Cloud Bigtable is Google’s fully managed, scalable NoSQL database service.

Cloud Bigtable is a sparsely populated table that can scale to billions of rows and millions of columns, enabling you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key.

In Bigtable, each row represents a single entity (such as an individual user or sensor) and is labeled with a unique row key. Each column stores attribute values for each row, and column families can be used to organize related columns. At the intersection of a row and column, there can be multiple cells, with each cell representing a different version of the data at a given timestamp.

Notes

  • There is an inconsistency in the documentation. In some places, it says thousands of columns, in other places millions of columns.
  • Even though a table can have millions of columns, a row should not. link
  • A row key must be 4 KB or less
  • Does not support joins
  • Transactions are supported only within a single row
  • Each table has only one index, the row key
  • The intersection of a row and column can contain multiple timestamped cells
  • Tables are sparse

History

Google Research in Data Technologies

There is a massive amount of knowledge in these research papers. If your goal is to better understand how data is managed and processed at Google read a few of these. Colossus is probably the most important.

Products that use Cloud Bigtable

Bigtable is used internally at Google for a number of products.

  • Gmail
  • Google Analytics
  • Google Blogger
  • Google Books
  • Google Code
  • Google Earth
  • Google Maps
  • YouTube

Apache HBase and Cassandra are some of the best known open source projects that were modeled after Bigtable.

As of January 2022, Bigtable manages over 10 Exabytes of data and serves more than 5 billion requests per second.

Key Features

  • Fully managed NoSQL database
  • Horizontally scalable
  • Optimized for high reads/writes per second
  • Supports millions of requests per second with single digit millisecond latency
  • Low latency
  • Highly scalable database
  • Eventually Consistent
  • SLO/SLA
    • 99.9 – Cloud Bigtable – Zonal instance (single cluster)
    • 99.999 – Replicated Instance (2 or more clusters) with Multi-Cluster routing policy (3 or more Regions)
  • Million of columns
  • Column Families
  • Column Cells
  • 256 MB per row
  • Integration with Big Data Tools
    • Apache HBase API Standard
    • Apache Beam
    • Apache Hadoop
    • Apache Spark
  • Integration with Google Products
    • BigQuery
    • Cloud Dataflow
    • Cloud Dataproc

Architecture

  • Horizontally scalable
  • Throughput can be adjusted by adding or removing nodes
  • Each node provides 10,000 queries per second
  • Each node has its own storage
  • No downtime while changing nodes
  • Components
    • Frontend Server Pool
    • Bigtable Cluster
      • Nodes
    • Scalable Storage Backend
      • Shared data into multiple “tablets”
        • A Bigtable table is sharded into blocks of contiguous rows, called tablets, to help balance the workload of queries. Each tablet is associated with a node, and operations on these rows are performed on the node. To optimize performance, tablets are split or moved to a different node depending on access patterns. Based on user access patterns — read, write, and scan operations — tablets are rebalanced across the nodes.
      • SSTable – Sorted String Table

Use cases

  • Key usage: Lots of data over lots of time (days)
  • Time-series data
  • Marketing data
  • Low latency serving
  • Financial data
  • IoT data
  • Graph data

Best practices

  • Understand Cloud Bigtable performance – estimating throughput for Cloud Bigtable, how to plan Cloud Bigtable capacity by looking at throughput and storage use, how enabling replication affects read and write throughput differently, and how Cloud Bigtable optimizes data over time.
  • Cloud Bigtable schema design – guidance on designing Cloud Bigtable schema, including concepts of key/value store, designing row keys based on planned read requests, handling columns and rows, and special use cases.
  • Cloud Bigtable replication overview – how to replicate Cloud Bigtable across multiple zones or regions, understand performance implications of replication, and how Cloud Bigtable resolves conflicts and handles failovers.
  • About Bigtable backups – how to save a copy of a table’s schema and data with Bigtable Backups, which can help you recover from application-level data corruption or from operator errors, such as accidentally deleting a table.

Cloud Bigtable pricing

When estimating the pricing for Bigtable review the following major items

  • The type of Bigtable instance and the total number of nodes in your instance’s clusters
  • The amount of storage that your tables use
  • The amount of network bandwidth that you use
  • Data Access audit log costs, if enabled. This item is often overlooked.
  • Backup storage

Bigtable IAM

These notes on IAM are just tidbits that I want to remember. Refer to this guide for more details.

  • You can configure access control at the following levels:
    • project
    • instance
    • table
  • IAM Permission Categories
    • App profile permissions
    • Backups permissions
    • Cluster permissions
    • Hot tablets permissions
    • Instance permissions
    • Key visualizer permissions
    • Location permissions
    • Table permissions
  • Predefined IAM Roles
    • Bigtable Administrator – roles/bigtable.admin
    • Bigtable Reader – roles/bigtable.reader
    • Bigtable User – roles/bigtable.user
    • Bigtable Viewer – roles/bigtable.viewer
  • In Bigtable, you cannot grant access to the following types of principals:
    • allAuthenticatedUsers
    • allUsers
  • Instance-level IAM management
    • gcloud bigtable instances set-iam-policy INSTANCE_ID POLICY_FILE
  • Table-level IAM management
    • gcloud bigtable instances tables set-iam-policy TABLE_ID –instance=INSTANCE_ID POLICY_FILE
  • IAM conditions
    • Date/time attributes
      • Use to set temporary (expiring), scheduled, or limited-duration access to Bigtable resources. For example, you can allow a user to access a table until a specified date.
    • Resource attributes
      • Use to configure conditional access based on a resource name, resource type, or resource service attributes. In Bigtable, you can use attributes of instances, clusters, and tables to configure conditional access. For example, you can allow a user to manage tables only on tables that begin with a specific prefix, or you can allow a user to access only a specific table.
  • Example IAM Policy File – note I have not verified this yet on a real Bigtable instance.

 

Terraform

Documentation Resources

Interesting Google Articles

Interesting Third-party Articles

Links for Developers

Practice Resources

Cloud Bigtable is fairly expensive to practice with personally. A single node instance will cost $650 per month. Because of that reason, I recommend doing your practice in Google Qwiklabs so that the cost is zero beyond your membership fees.

  • Qwiklabs – Manage Bigtable on Google Cloud – Quest
    • GSP1053: Designing and Querying Bigtable Schemas
    • GSP1054: Creating and Populating a Bigtable Instance
    • GSP1055: Streaming Data to Bigtable
    • GSP1056: Managing Bigtable Health and Performance
  • Qwiklab – Additional Labs
    • GSP099: Bigtable: Qwik Start – Command Line
    • GSP1038: Introduction to Cloud Bigtable (Java)
  • Cloud Bigtable Emulator
    • gcloud components update
    • gcloud components install beta
    • gcloud components install bigtable
    • gcloud beta emulators bigtable start
    • gcloud beta emulators bigtable start –host-port=[HOST]:[PORT]
    • docker run -p 127.0.0.1:8086:8086 –rm -ti google/cloud-sdk gcloud beta emulators bigtable start –host-port=0.0.0.0:8086

YouTube Videos

Cloud Bigtable CLI

The cbt CLI is a command-line interface for performing several different operations on Cloud Bigtable.

Installation

On Windows, installation requires “Run as Adminstrator”.

  • gcloud components update
  • gcloud components install cbt
  • Create a .cbtrc file
  • cbt listinstances

Questions / Practice

  • Backup/Restore
    • How to create a backup
    • How to export a backup
    • How to restore from a backup

Note: this article is a work in progress while I train for the PCA and Database exams.

Summary

Of the Google Cloud managed database services, Bigtable is the most impressive. The management and monitoring GUIs are the best I have experienced across the big three cloud vendors. You can deploy a cluster in one region in 5 minutes, fill it with data and then add replicas around the world in minutes. No downtime, no customer interruption, no procedure to copy data, seed replication, etc. It does everything for you. Simply amazing.

I wish that Google Cloud consider creating a small single-node Bigtable instance that costs $100/month so that small companies can implement Bigtable without thinking about costs. Small companies do not need the performance that even a single node provides. As these companies grow they can switch to a standard instance type. Perhaps a limited-cost standard node at a reduced price for 12 months. Trust me, they won’t give up Bigtable when the promotion expires.