This document provides a detailed overview and guide for using the Google Cloud BigQuery Batch Load OutSystems component. This component facilitates efficient, scalable, and automated data ingestion into Google BigQuery, alongside robust query execution and storage management capabilities.
The Google Cloud BigQuery Batch Load component is designed to simplify interactions between OutSystems applications and Google BigQuery. It leverages Google Cloud Storage for bulk data processing, allowing you to ingest large volumes of data cost-effectively. The component covers the entire lifecycle of data ingestion, from automatic dataset and table creation to monitoring and cleanup of processed files.
A dedicated Test and Monitor application is included to assist with configuration, testing, and ongoing management of your BigQuery operations.
This component is ideal for scenarios requiring:
High-volume, cost-effective data loading: Utilize BigQuery's free batch loading tier for large JSON datasets.
Automated schema management: Datasets and tables are created automatically based on your input data.
Flexible query execution: Run various SQL queries against your BigQuery data.
Simplified storage management: Control and clean up files in your Google Cloud Storage bucket.
Monitoring and error handling: Track the status of your data ingestion jobs.
The component exposes several public service and server actions, categorized for clarity.
After installing the component from Forge, you will have access to the BigQueryBatchLoad_IS module and the BigQueryBatchLoadManager application.
BigQueryBatchLoad_IS
BigQueryBatchLoadManager
The BigQueryBatchLoadManager application serves as your control panel for configuring and testing the component. Navigate to the Status tab to begin.
On the Status screen, you will configure and test your Google Cloud Project details.
Key Configuration Fields:
Google Project Id: Your Google Cloud Project ID.
Service Account Private Key: The private key of your Google Cloud service account.
Service Account Email: The email address of your Google Cloud service account.
Token Life Span in Seconds: Duration for authentication tokens.
Google Storage Bucket Name: The name of the Google Cloud Storage bucket used for staging JSON files.
Storage Upload Timeout: Timeout for uploading files to the bucket.
Remove Logs & Older Unsuccessful Records After (X Days): Defines the retention period for logs and unsuccessful records.
Important Permissions: Ensure your Google Service Account has the necessary permissions as outlined in the "Notes" section of the configuration screen, including permissions to view/list, create, and delete storage files, and to create/insert data into BigQuery datasets and tables.
The primary method for ingesting data is GCBigQueryFileProcess. This action handles sending your JSON data to Google Cloud Storage, which then triggers the import process into BigQuery.
GCBigQueryFileProcess
Action: GCBigQueryFileProcess (Service Action)
Description: Uploads a JSON array file to the configured Google Cloud Storage bucket. BigQuery then automatically imports this file, creating the dataset and table if they don't already exist. The automatically created datasets and tables are configured with case-insensitive collation for text attributes, simplifying search and filtering.
Input Parameters:
DatasetName (Text): The name of the BigQuery dataset.
DatasetName
TableName (Text): The name of the BigQuery table.
TableName
JSONFile (Text): A JSON array string (e.g., "[{\"name":"Name 1"},{"name":"Name 2"}]").
JSONFile
"[{\"name":"Name 1"},{"name":"Name 2"}]"
TotalRecords (Long Integer): The count of records in the JSON array, primarily for logging and monitoring purposes.
TotalRecords
Output Parameters:
FileGuid (Text): A unique identifier for the uploaded file, useful for monitoring its status.
FileGuid
ErrorMessage (Text): Returns an error message if the process fails.
ErrorMessage
Important Note: The component fully manages the cleanup of successfully imported files from Google Storage Bucket, based on the configured retention period. It also ensures that unsuccessfull processed files are also retried and resent until they are successfully processed.
Once a file is sent via GCBigQueryFileProcess, you can monitor its status using the GCBigQueryFileStatus action and the Monitoring tab in the manager application.
GCBigQueryFileStatus
Action: GCBigQueryFileStatus (Service Action)
Description: This action queries the status of a specific BigQuery load job associated with a FileGuid. It provides detailed feedback on whether the file has been processed, if any errors occurred, and allows for immediate cleanup of the source file from the Google Cloud Storage bucket upon successful import.
FileGuid (Text): The unique identifier returned by the GCBigQueryFileProcess action. This is the key to tracking your file.
DeleteIfProcessed (Boolean): If set to True, this action will immediately delete the corresponding file from the Google Cloud Storage bucket after confirming it has been successfully processed by BigQuery. This is a highly efficient way to manage storage, as it cleans up files as they complete, rather than waiting for a scheduled bulk cleanup.
DeleteIfProcessed
True
IsProcessed (Boolean): Returns True if the file has been successfully imported into BigQuery.
IsProcessed
JobErrors (Text): If the import failed, this output contains the specific error message from the BigQuery job (e.g., schema mismatch, data format error).
JobErrors
FileNotFound (Boolean): Returns True if the file corresponding to the provided FileGuid could not be located.
FileNotFound
ErrorMessage (Text): Returns a general error message if the status check action itself fails (e.g., due to an authentication issue).
The component provides two distinct actions for running SQL queries against your BigQuery datasets, catering to different timeout requirements.
GCBigQueryExecuteQueryService
Description: Ideal for quick, synchronous queries that are expected to complete within the standard platform timeout. This action runs with a fixed timeout of up to 100 seconds.
Inputs: SQLQuery (Text)
SQLQuery
Outputs: ResultJSON (Text), TotalRows (Integer), ProcessingTimeMs (Long Integer), ErrorMessage (Text)
ResultJSON
TotalRows
ProcessingTimeMs
GCBigQueryExecuteQuery
Description: Designed for long-running or complex queries. As a public server action, it allows you to specify a custom timeout that can exceed 100 seconds, preventing platform timeout exceptions for heavy analytical workloads.
Inputs: SQLQuery (Text), Timeout (Integer)
Timeout
The Execute tab in the management application provides a simple UI to run ad-hoc queries using this functionality.
The component includes several other utility actions for comprehensive management of your BigQuery and Google Cloud Storage resources.
GCBigQueryListDatasets: Lists all datasets within your configured Google Cloud Project.
GCBigQueryListDatasets
GCBigQueryListTables: Lists all tables within a specified dataset.
GCBigQueryListTables
The Datasets & Tables tab in the management app uses these actions to allow you to browse your BigQuery structure.
GCStorageFileList: Lists all files currently in the configured Google Cloud Storage bucket.
GCStorageFileList
GCStorageFileDownload: Downloads a specific file from the bucket.
GCStorageFileDownload
GCStorageFileDelete: Deletes a specific file from the bucket.
GCStorageFileDelete
GCBatchLoadFileDelete: Deletes a file record from the component's internal logging/tracking tables.
GCBatchLoadFileDelete
GCBatchLoadCleanUp: A batch job action that cleans up old log records and processed files from the bucket immediately, ignoring the retention policy set in the configuration.
GCBatchLoadCleanUp
The Google Cloud BigQuery Batch Load component includes a powerful management application that serves as your central hub for configuration, testing, and monitoring. This guide explains the functionality of each screen available in the app menu.
This is the main landing and configuration page. It provides an at-a-glance health check of your setup and is where you'll input your Google Cloud credentials.
Key Functions:
Configuration Status: Confirms that all necessary configuration fields have been filled.
Authentication Test: Shows whether the provided credentials are valid and can successfully authenticate with your Google Cloud project.
Configuration Form: This is where you set up the connection. You must provide your Google Project ID, Service Account Email, Service Account Private Key, and the Google Storage Bucket Name that will be used for staging files.
Settings: You can also configure operational parameters like token lifespan, upload timeouts, and the retention period for old logs and files.
The Test screen provides a simple, two-step interface to perform a live, end-to-end test of the data ingestion pipeline, from sending a JSON file to checking its import status.
How to Use:
Send: In the first panel, specify a Dataset Name and Table Name. Paste your sample JSON File Data (as a JSON array) into the text area and enter the Total Records count. Click Send & Process. This action uploads the data to your bucket and initiates the BigQuery load job, returning a unique File GUID.
Dataset Name
Table Name
JSON File Data
Total Records
File GUID
Check: In the second panel, paste the File GUID you received from the send step. Click Check Status to query the real-time status of the import process.
This screen is your central dashboard for observing all data ingestion jobs. It gives you full visibility into the files being processed and provides a history of completed jobs (deleted files).
Key Features:
Active Files / Deleted Files Toggle: Switch between two views:
Active Files: Shows all files currently being processed or awaiting processing. You can see the file's GUID, target dataset/table, and status. This view is essential for tracking ongoing jobs.
Deleted Files: Provides a historical log of jobs that have been processed and whose source files have been cleaned up from the storage bucket.
Clean-Up Now: A manual trigger that runs the cleanup logic to remove all successfully processed files from your storage bucket immediately.
Refresh: Updates the dashboard with the latest status of all jobs.
The Storage screen gives you direct access to view and manage the files within your configured Google Cloud Storage bucket.
File Listing: Displays a list of all files currently in the staging bucket, along with their size and creation/update timestamps.
File Actions: For each file, you have two options:
Download (cloud icon): Download a copy of the file directly.
Delete (trash can icon): Manually delete the file from the bucket.
Note: This screen also reminds you of the automatic cleanup logic that removes files after a successful import or after they exceed the configured age for unsuccessful imports.
This screen provides a simple but powerful tool for running ad-hoc SQL queries directly against your BigQuery project from within OutSystems.
Query Editor: A text area where you can write and edit your SQL queries.
Timeout Setting: Allows you to specify a custom timeout in seconds. This is crucial for running complex or long-running queries that might otherwise exceed standard platform limits.
Execute Button: Runs the query and displays the results.
This screen acts as a simple browser for your BigQuery project, allowing you to see the datasets and tables that exist. It's a great way to quickly verify that the component is creating your schemas as expected.
Dataset List: On the left, it lists all the datasets available in your configured project.
Table List: When you select a dataset, the panel on the right populates with a list of all the tables contained within that dataset.