Built-In Spark UI: Real-Time Job Tracking For Spark Batches
Dataproc Serverless: More rapid, simpler, and intelligent. To provide new features that further improve the speed, ease of use, and intelligence ofĀ DataprocĀ Serverless.
Elevate your Spark experience with:
Native query execution:Ā Take use of the new Native query execution in the Premium tier to see significant speed improvements.
Using Spark UI for smooth monitoring:Ā With a built-inĀ Spark UIĀ that is accessible by default for all Spark batches and sessions, you can monitor task progress in real time.
Investigation made easier:Ā Troubleshoot batch operations from a single āInvestigateā page that automatically filters logs by errors and shows all the important metrics highlighted.
Using Gemini for proactive autotuning and supported troubleshooting:Ā AllowĀ GeminiĀ to reduce malfunctions and adjust performance by analyzing past trends. Utilize Gemini-powered insights and suggestions to swiftly address problems.
Accelerate your Spark jobs with native query execution
By enabling native query execution, you may significantly increase the performance of your Spark batch tasks in the Premium tier on Dataproc Serverless Runtimes 2.2.26+ or 1.2.26+ without requiring any modifications to your application.Image Credit To Google Cloud
In the experiments using queries taken from the TPC-DS and TPC-H benchmarks, this new functionality in theĀ DataprocĀ Serverless Premium tier increased the query performance by around 47%.
The 1TB GCS Parquet data and queries produced from the TPC-DS and TPC-H standards serve as the foundation for the performance findings. Since these runs do not meet all of the standards of the TPC-DS standard and the TPC-H standard specification, they cannot be compared to published TPC-DS standard and TPC-H standard results.
Use the native query execution qualifying tool to get started right away. It will make it simple to find tasks that qualify and calculate possible performance improvements. Once the batch tasks on the list have been identified for native query execution, you may activate it to speed up the operations and perhaps save money.
Seamless monitoring with Spark UI
Are you sick and weary of battling to manage and build up persistent history server (PHS) clusters for the sole purpose of debugging your Spark batches? Wouldnāt it be simpler to see theĀ Spark UIĀ in real-time without having to pay for the history server?
Up until recently, establishing and maintaining a separate Spark persistent history server was necessary for tracking and debugging Spark activities in Dataproc Serverless. Importantly, the history server has to be set up for every batch run. Otherwise, the batch jobās study of the open-source user interface would not be possible. Additionally, switching between apps was sluggish in the open-source user interface.
It have clearly heard you. It present Dataproc Serverlessās completely managedĀ Spark UI, which simplifies monitoring and troubleshooting.
In both the Standard and Premium levels of Dataproc Serverless,Ā the SparkĀ UI is integrated and accessible immediately for any batch job and session at no extra cost. Just submit your task, and you can immediately begin using theĀ Spark UIĀ to analyze performance in real time.
Accessing the Spark UI
The āVIEWĀ SPARK UIā link is located in the upper right corner.Image Credit To Google Cloud
With detailed insights into your Spark job performance, the newĀ Spark UIĀ offers the same robust functionality as the open-source Spark History Server. Browse active and finished applications with ease, investigate jobs, stages, and tasks, and examine SQL queries to have a thorough grasp of how your application is being executed. Use thorough execution information to diagnose problems and identify bottlenecks quickly.
The āExecutorsā page offers direct connections to the relevant logs in Cloud Logging for even more in-depth investigation, enabling you to look into problems pertaining to certain executors right away.
If you have previously set up a Persistent Spark History Server, you may still see it by clicking the āVIEW SPARK HISTORY SERVERā link.
Streamlined investigation (Preview)
You may get immediate diagnostic highlights gathered in one location with the new āInvestigateā option in the Batch details page.
The key metrics are automatically shown in the āMetrics highlightsā area, providing you with a comprehensive view of the state of your batch task. If you want more metrics, you have the option to design a custom dashboard.Image Credit To Google Cloud
A widget called āJob Logsā displays the logs sorted by mistakes underneath the metrics highlights, allowing you to quickly identify and fix issues.
Proactive autotuning and assisted troubleshooting with Gemini (Preview)
Finally, when submitting your batch job setups, Gemini inĀ BigQueryĀ may assist simplify the process of optimizing hundreds of Spark attributes. Gemini can eliminate the need to go through many gigabytes of logs in order to debug the operation if it fails or runs slowly.
Enhance performance:Ā Gemini may automatically adjust your Dataproc Serverless batch tasksā Spark settings for optimum dependability and performance.
Simplify troubleshooting:Ā By selecting āAsk Geminiā for AI-powered analysis and help, you may rapidly identify and fix problems with sluggish or unsuccessful tasks.
Read more on Govindhtech.com













