Lorenzo Caggioni has a great blog post on how to visualise VPC Service Controls violations using Big Query and Looker Studio (formerly Data Studio). The dashboard is super useful for understanding which services, identities or GCP projects are generating the most violations. This makes it easier to determine where to start focusing your efforts in addressing the violations.
This short post provides the terraform code to:
- Create the log sink to Big Query for VPC SC.
- Create a table to use for the dashboard.
variable "gcp_org_id" {
type = string
description = "The GCP Organization ID"
default = "changme"
}
locals {
bq_project = "changeme"
org_service_account = "service-org-${var.gcp_org_id}@gcp-sa-logging.iam.gserviceaccount.com"
# You can customise this filter to only show logs for specific services
base_filter = <<EOF
protoPayload.@type="type.googleapis.com/google.cloud.audit.AuditLog"
protoPayload.metadata.@type="type.googleapis.com/google.cloud.audit.VpcServiceControlAuditMetadata"
EOF
}
# Create an org level log sink
resource "google_logging_organization_sink" "bq_log_sink" {
name = "vpcsc-bq-sink"
org_id = var.gcp_org_id
# "bigquery.googleapis.com/projects/[PROJECT_ID]/datasets/[DATASET]"
destination = "bigquery.googleapis.com/${google_bigquery_dataset.vpcsc_violations_dataset.id}"
filter = trimspace(local.base_filter)
bigquery_options {
use_partitioned_tables = true
}
include_children = true
}
# This permission is required as documented here https://cloud.google.com/logging/docs/export/configure_export_v2#dest-auth
# Grant the org level service account the Big Query editor role
resource "google_bigquery_dataset_iam_member" "vpcsc_violations_dataset" {
dataset_id = google_bigquery_dataset.vpcsc_violations_dataset.dataset_id
role = "roles/bigquery.dataEditor"
member = "serviceAccount:${local.org_service_account}"
}
# Create a BQ Data set where the log sink will export VPC SC logs
# This will then be visualised in Looker / Data Studio
resource "google_bigquery_dataset" "vpcsc_violations_dataset" {
dataset_id = "vpsc_violations_data"
friendly_name = "vpsc-violations-data"
description = "VPC Service Control Violations Data"
location = "EU"
# Make a group (or it could be a user but the former is preferred) the owner of the data set
access {
role = "OWNER"
group_by_email = "mygroup@acme.com"
}
# Grant the org service account writer permissions
access {
role = "WRITER"
user_by_email = local.org_service_account
}
access {
role = "READER"
group_by_email = "anothergroup@acme.com"
}
}
# Create a table from a view - this extracts nested fields from
# the field protopayload_auditlog.metadataJson
# this makes it easier to show information from this field in Data Studio/Looker charts
resource "google_bigquery_table" "cleaned_vpcsc_violations_table" {
dataset_id = google_bigquery_dataset.vpcsc_violations_dataset.dataset_id
table_id = "cleaned_vpcsc_violations_table"
view {
query = templatefile("${path.module}/vpcsc-violations-bq-table.tpl", {
project = local.bq_project
dataset = google_bigquery_dataset.vpcsc_violations_dataset.dataset_id
}
)
use_legacy_sql = false
}
}
Here is some example SQL to create the table view:
SELECT
*,
CASE
WHEN REGEXP_CONTAINS(protopayload_auditlog.status.message,'(Dry Run Mode)*') THEN 'Dry Run'
ELSE 'Enforced'
END enforced_type,
CASE
WHEN REGEXP_CONTAINS(protopayload_auditlog.requestMetadata.callerIp, r"(^127\.)|(^10\.)|(^172\.1[6-9]\.)|(^172\.2[0-9]\.)|(^172\.3[0-1]\.)|(^192\.168\.)") OR protopayload_auditlog.requestMetadata.callerIp = 'private' THEN 1
ELSE 0
END internal_ip,
REGEXP_EXTRACT(protopayload_auditlog.metadataJson, r'servicePerimeterName":"[a-zA-Z]+\/[\d]+\/[a-zA-Z]+\/([a-zA-Z_]+)') as perimeter,
REGEXP_EXTRACT(protopayload_auditlog.metadataJson, r'violationReason":"([a-zA-Z_]+)') as violation_reason,
FROM
`${project}.${dataset}.cloudaudit_googleapis_com_policy`
This is the same example query that Lorenzo provides in hist post - the only change is to permit underscores in the RegEx for the perimeter.
The Google documentation also have their own example:
SELECT
receiveTimestamp, #time of violation
Resource.labels.service, #protected Google Cloud service being blocked
protopayload_auditlog.methodName, #method name being called
resource.labels.project_id as PROJECT, #protected project blocking the call
protopayload_auditlog.authenticationInfo.principalEmail, #caller identity
protopayload_auditlog.requestMetadata.callerIp, #caller IP
CASE
WHEN REGEXP_CONTAINS(protopayload_auditlog.requestMetadata.callerIp, r"(^127\.)|(^10\.)|(^172\.1[6-9]\.)|(^172\.2[0-9]\.)|(^172\.3[0-1]\.)|(^192\.168\.)") OR protopayload_auditlog.requestMetadata.callerIp = 'private' THEN 1
ELSE 0
END internal_ip, # Bool to indicate whether the caller IP is an RFC1918 address range or not
JSON_EXTRACT(protopayload_auditlog.metadataJson, '$.dryRun') as DRYRUN, #dry-run indicator
JSON_EXTRACT(protopayload_auditlog.metadataJson, '$.violationReason') as REASON, #reason for violation
protopayload_auditlog.metadataJson, #raw violation entry
FROM
`${project}.${dataset}.cloudaudit_googleapis_com_policy`
WHERE
JSON_EXTRACT(protopayload_auditlog.metadataJson, '$.dryRun') = "true" #ensure these are dry-run logs
The good thing about Google’s version is that it uses the as
keyword to create aliases for table columns that are more user friendly, so you do not need to rename them in Looker Studio. I have amended it to include the column that indicates if the IP is internal or not from Lorenzo’s version.