My gender was replaced with a desire to cuddle cute people.

seen from Singapore

seen from Kuwait
seen from Netherlands

seen from United States
seen from United Kingdom

seen from Malaysia
seen from Australia
seen from Austria
seen from China
seen from China
seen from Singapore
seen from United States

seen from Italy
seen from Malaysia
seen from China

seen from Italy

seen from Czechia

seen from United States

seen from Indonesia
seen from Türkiye
My gender was replaced with a desire to cuddle cute people.

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
nbla stimboard
x x x / x o x / x x x
Dleat
My NBLA flag redesign stimboard!
x x x / x o x / x x x
JAX: Symbolic Power Unlocks Scientific Computing
Beyond Backpropagation: JAX’s Rise in Scientific Computing For years, JAX has been widely recognized for its significant role in developing large-scale artificial intelligence models within the Google ecosystem. However, a growing wave of researchers is now discovering its immense potential far beyond machine learning – particularly in scientific computing. This powerful framework’s unique…

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
From SLA to XLA: Why ITSM Needs an Experience Revolution
Are you still measuring IT success by SLAs alone? 🎯 In 2025, leading enterprises are redefining ITSM with Experience-Level Agreements (XLAs) — metrics focused on how users feel, not just how fast tickets are closed.
In our latest blog, discover: ✅ Why SLA compliance doesn’t guarantee employee satisfaction ✅ How GenAI + ServiceNow enable proactive, experience-led IT ✅ Real-world success stories from global enterprises
💡 Experience is the new SLA. Don’t get left behind!
🔗 Read the full article here
PyTorch/XLA 2.5: vLLM Support And Developer Improvements
PyTorch/XLA 2.5
PyTorch/XLA 2.5: enhanced development experience and support for vLLM
PyTorch/XLA, a Python package that connects the PyTorch deep learning framework with Cloud TPUs via the XLA deep learning compiler, has machine learning engineers enthusiastic. Additionally, PyTorch/XLA 2.5 has arrived with a number of enhancements to improve the developer experience and add support for vLLM. This release’s features include:
An explanation of the plan to replace the outdated torch_xla API with the current PyTorch API, which would simplify the development process. The transfer of the current Distributed API serves as an illustration of this.
A number of enhancements to the torch_xla.compile function that enhance developers’ debugging experience when they are working on a project.
You can expand your current deployments and use the same vLLM interface across all of your TPUs thanks to experimental support in vLLM for TPUs.
Let’s examine each of these improvements.
Streamlining torch_xla API
Google Cloud is making a big stride toward improving the consistency of the API with upstream PyTorch with PyTorch/XLA 2.5. Its goal is to make XLA devices easier to use by reducing the learning curve for developers who are already familiar with PyTorch. When feasible, this entails phasing out and deprecating proprietary PyTorch/XLA API calls in favor of more sophisticated functionality, then switching the API calls to their PyTorch equivalents. Before the migration, several features were still included in the current Python module.
It has switched to using some of the existing PyTorch distributed API functions when running models on top of PyTorch/XLA in this release to make the development process for PyTorch/XLA easier. In this release, it moved the majority of the calls for the distributed API from the torch_xla module to torch.distributed.
With PyTorch/XLA 2.4
import torch_xla.core.xla_model as xm xm.all_reduce()
Supported after PyTorch/XLA 2.5
torch.distrbuted.all_reduce()
A better version of “torch_xla.compile”
To assist you in debugging or identifying possible problems in your model code, it also includes a few new compilation features. For instance, when there are many compilation graphs, the “full_graph” mode generates an error message. This aids in the early detection (during compilation) of possible problems brought on by various compilation graphs.
You may now also indicate how many recompilations you anticipate for compiled functions. This can assist you in troubleshooting performance issues if a function may be recompiled more frequently than necessary, such as when it exhibits unexpected dynamism.
Additionally, you can now give compiled functions a meaningful name rather than one that is generated automatically. When debugging messages, naming compiled targets gives you additional context, which makes it simpler to identify the potential issue. Here’s an illustration of how that actually appears in practice:
named code
@torch_xla.compile def dummy_cos_sin_decored(self, tensor): return torch.cos(torch.sin(tensor))
target dumped HLO renamed with named code function name
… module_0021.SyncTensorsGraph.4.hlo_module_config.txt module_0021.SyncTensorsGraph.4.target_arguments.txt module_0021.SyncTensorsGraph.4.tpu_comp_env.txt module_0024.dummy_cos_sin_decored.5.before_optimizations.txt module_0024.dummy_cos_sin_decored.5.execution_options.txt module_0024.dummy_cos_sin_decored.5.flagfile module_0024.dummy_cos_sin_decored.5.hlo_module_config.txt module_0024.dummy_cos_sin_decored.5.target_arguments.txt module_0024.dummy_cos_sin_decored.5.tpu_comp_env.txt …
You can observe the difference between the original and named outputs from the same file by looking at the output above. The automatically produced name is “SyncTensorsGraph.” The renamed file associated with the preceding tiny code example is shown below.
vLLM on TPU (testing)
You can now use TPU as a backend if you serve models on GPUs using vLLM. A memory-efficient and high-throughput inference and serving engine for LLMs is called vLLM. To make model testing on TPU easier, vLLM on TPU maintains the same vLLM interface that developers adore, including direct integration into Hugging Face Model Hub.
It only takes a few configuration adjustments to switch your vLLM endpoint to TPU. Everything is unchanged except for the TPU image: the model source code, load balancing, autoscaling metrics, and the request payload. Refer to the installation guide for further information.
Pallas kernels like paged attention, flash attention, and dynamo bridge speed optimizations are among the other vLLM capabilities it has added to TPU. These are all now included in the PyTorch/XLA repository (code). Although PyTorch TPU users may now access vLLM, this work is still in progress, and it anticipate adding more functionality and improvements in upcoming releases.
Use PyTorch/XLA 2.5
Downloading the most recent version via your Python package manager will allow you to begin utilizing these new capabilities. For installation instructions and more thorough information, see the project’s GitHub page if you’ve never heard of PyTorch/XLA before.
Read more on Govindhtech.com
PyTorch/XLA 2.4: Pallas & developer experience, “eager mode”
PyTorch/XLA 2.4
For deep learning academics and practitioners, the open-source PyTorch machine learning (ML) library and XLA ML compiler provide flexible, powerful model training, fine-tuning, and serving. The PyTorch/XLA team is happy to announce the release of PyTorch/XLA 2.4 today. This version includes several noteworthy enhancements to address issues raised by developers and builds on the previous release. Here, we go over a few of the most recent additions that facilitate using PyTorch/XLA:
Pallas, a proprietary kernel language that supports GPUs and TPUs, has been improved.
Fresh calls to the API
The “eager mode” experiment is introduced.
The TPU command line interface has been updated.
Pallas improvements
Although the XLA compiler can optimize your current models, there are situations in which bespoke kernel code can provide model authors with superior performance. Pallas is a bespoke kernel language that supports TPU and GPUs, so instead of requiring you to use a more complex and lower-level language like C++, you can write more performant code in Python that is closer to the hardware. Pallas is comparable to the Triton library, but it makes porting your model from one machine learning accelerator to another easier because it runs on both TPUs and GPUs.
The latest version of PyTorch/XLA 2.4 brings improvements to Pallas’ functionality and user experience.
Flash Attention is now completely integrated with PyTorch autograd, allowing for automatic gradient calculation.
Integrated assistance for Paged Focus on Inference.
Support for group matrix multiplication using Mega blocks’ block sparse kernels as an Autograd function, eliminating the requirement for backpropagation to be done manually.
API modifications
A few new calls are included in PyTorch/XLA 2.4 to facilitate integration with your current PyTorch workflow, such as:device = torch_xla.device()
And now you can call torch_xla.sync() in place of having to do xm.mark_step(). The developer workflow is enhanced and the process of converting your code to PyTorch/XLA is made simpler by these enhancements.
import torch_xla.core.xla_model as xm device = xm.device()
Try out the eager mode
If you’ve worked with PyTorch/XLA for any length of time, you are aware of the term “lazily executed” models. This implies that before models are sent to be performed on the XLA device target hardware, PyTorch/XLA 2.4 builds the compute graph of operation. Operations are compiled and then instantly carried out on the target hardware with the new eager mode.
The drawback of this feature is that, because each instruction is not conveyed to the TPU immediately by default, TPUs themselves lack a real eager mode. In order to compel the compilation and execution, Google cloud add a “mark step” call to each PyTorch action on TPUs. As a result, eager mode functions, albeit as an emulator rather than a built-in feature.
With this release, Google cloud want for eager mode to be used in your local surroundings rather than in your production environment. Eager mode is intended to simplify local model debugging on your PCs without requiring you to deploy it to a broader fleet of devices, as is the case with most production systems.
CLI to view Cloud TPU information
The nvidia-smi tool, which you can use to troubleshoot your GPU workloads, determine which cores are being used, and check how much memory a particular workload is consuming, may be familiar to you if you’ve previously used Nvidia GPUs. Additionally, a comparable command line tool has been developed for Cloud TPUs that facilitates the retrieval of device and utilization data.
Start using PyTorch/XLA 2.4 right now
The best aspect is that your current code is still compatible with PyTorch/XLA 2.4, despite the fact that it has certain API changes. Additionally, the new API methods will make your future development processes easier. What are you waiting for? Try the most recent version.
The PyTorch logo: Key Features
Prepared for Production
Use TorchScript to switch between eager and graph modes with ease, and TorchServe to quicken the production process.
Dispersed Instruction
The torch.distributed backend enables scalable distributed training and performance optimisation in research and production.
Sturdy Ecosystem
A plethora of tools and frameworks complement PyTorch/XLA 2.4 Improved Pallas and developer experience, “eager mode” and facilitate its development in computer vision, natural language processing, and other fields.
Cloud Assistance
Major cloud platforms support PyTorch well, enabling easy scaling and frictionless development.
XLA Features
Accelerated Linear Algebra, or XLA, is an open-source machine learning compiler. Models from well-known frameworks like PyTorch, TensorFlow, and JAX are imported into the XLA compiler, which then optimises them for high-performance execution on a variety of hardware platforms, including GPUs, CPUs, and ML accelerators. For instance, employing XLA with 8 Volta V100 GPUs produced a ~7x performance boost and ~5x batch-size improvement over the same GPUs without XLA in a BERT MLPerf submission.
Leading ML hardware and software companies, including as Alibaba, Amazon Web Services, AMD, Apple, Arm, Google, Intel, Meta, and NVIDIA, are working together to develop XLA as part of the OpenXLA initiative. Principal advantages
Construct anywhere: Prominent machine learning frameworks like TensorFlow, PyTorch, and JAX have already incorporated XLA.
Run anywhere: It has pluggable infrastructure to offer support for other backends, such as GPUs, CPUs, and ML accelerators, among other backends.
Optimise and scale performance: It makes use of automated partitioning for model parallelism and production-tested optimization stages to maximize a model’s performance.
Reduce complexity: By utilising MLIR, it combines the greatest features into a single compiler toolchain, saving you from having to handle a variety of domain-specific compilers.
Future-ready: XLA is an open-source project that was developed in conjunction with top ML software and hardware providers. Its goal is to be the industry leader in machine learning.
Read more on Govindhtech.com