GO8329 Logging, Monitoring, and Observability in Google
This three-day instructor-led course teaches participants techniques for monitoring, troubleshooting, and improving infrastructure and application performance in Google Cloud. Guided by the principles of Site Reliability Engineering (SRE), and using a combination of presentations, demos, hands-on labs, and real-world case studies, attendees gain experience with full-stack monitoring, real-time log management and analysis, debugging code in production, tracing application performance bottlenecks, and profiling CPU and memory usage.
COURSE OBJECTIVE:
This course teaches participants the following skills: • Plan and implement a well-architected logging and monitoring infrastructure • Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs) • Create effective monitoring dashboards and alerts • Monitor, troubleshoot, and improve Google Cloud infrastructure • Analyze and export Google Cloud audit logs • Find production code defects, identify bottlenecks, and improve performance • Optimize monitoring costs
TARGET AUDIENCE:
This class is intended for the following participants:Cloud architects, administrators, and SysOps personnelCloud developers and DevOps personnel
COURSE PREREQUISITES:
To get the most out of this course, participants should have: • Google Cloud Platform Fundamentals: Core Infrastructure or equivalent experience • Basic scripting or coding familiarity • Proficiency with command-line tools and Linux operating system environments
COURSE CONTENT:
The course includes presentations and hands-on labs.Module 1Introduction to Google Cloud Monitoring Tools • Understand the purpose and capabilities of Google Cloud operations-focused components: Logging, Monitoring, Error Reporting, and Service Monitoring • Understand the purpose and capabilities of Google Cloud application performance management focused components: Debugger, Trace, and ProfilerModule 2Avoiding Customer Pain • Construct a monitoring base on the four golden signals: latency, traffic, errors, and saturation • Measure customer pain with SLIs • Define critical performance measures • Create and use SLOs and SLAs • Achieve developer and operation harmony with error budgetsModule 3Alerting Policies • Develop alerting strategies • Define alerting policies • Add notification channels • Identify types of alerts and common uses for each • Construct and alert on resource groups • Manage alerting policies programmaticallyModule 4Monitoring Critical Systems • Choose best practice monitoring project architectures • Differentiate Cloud IAM roles for monitoring • Use the default dashboards appropriately • Build custom dashboards to show resource consumption and application load • Define uptime checks to track aliveness and latencyModule 5Configuring Google Cloud Services for Observability • Integrate logging and monitoring agents into Compute Engine VMs and images • Enable and utilize Kubernetes Monitoring • Extend and clarify Kubernetes monitoring with Prometheus • Expose custom metrics through code, and with the help of OpenCensusModule 6Advanced Logging and Analysis • Identify and choose among resource tagging approaches • Define log sinks (inclusion filters) and exclusion filters • Create metrics based on logs • Define custom metrics • Link application errors to Logging using Error Reporting • Export logs to BigQueryModule 7Monitoring Network Security and Audit Logs • Collect and analyze VPC Flow logs and Firewall Rules logs • Enable and monitor Packet Mirroring • Explain the capabilities of Network Intelligence Center • Use Admin Activity audit logs to track changes to the configuration or metadata of resources • Use Data Access audit logs to track accesses or changes to user-provided resource data • Use System Event audit logs to track GCP administrative actionsModule 8Managing Incidents • Define incident management roles and communication channels • Mitigate incident impact • Troubleshoot root causes • Resolve incidents • Document incidents in a post-mortem processModule 9Investigating Application Performance Issues • Debug production code to correct code defects • Trace latency through layers of service interaction to eliminate performance bottlenecks • Profile and identify resource-intensive functions in an applicationModule 10Optimizing the Costs of Monitoring • Analyze resource utilization cust for monitoring related components within Google Cloud • Implement best practices for controlling the cost of monitoring within Google Cloud
FOLLOW ON COURSES:
Not available. Please contact.