High Performance Computing

network2More data, less time. The time to process the data gets progressively smaller as the business requires more agility and real-time processing. We bridge this gap by developing novel computational methods and when it is not enough, utilize the power of massively parallel architecture of modern hardware.

General-purpose computing on graphics processing units (GPGPUs)

With its streamlined architecture, GPUs are often cheaper per TFLOP of performance that CPUs with traditional, general-purpose architecture. Lack of legacy support requirements accelerated the speed of GPU time to market, and the speed of GPUs has doubled in speed about each year, exceeding the pace of traditional CPUs. This GPU advantage comes with a cost, though. The memory system works differently than in a standard computer, and taking advantage of thousands of cores on a single chip requires careful timing of individual tasks for GPU utilization to be closer to 100%.

Aligned Research Group has a long history of creating optimal GPU code. Our engineers have been working with GPUs since 2009. We continued to follow the evolution of the CUDA framework through Tesla, Fermi and Kepler. In 2014, NVidia unveiled an all-new designed Maxwell architecture that is according to Nvidia has improved energy efficiency, control logic partitioning, workload balancing, clock-gating granularity, compiler-based scheduling, number of instructions issued per clock cycle, among other improvements over Kepler architecture.

Our expertise in NVIDIA GPU allows us to develop robust solutions, based on approach when the most important thing in applying GPU is the balance between memory usage and GPU core latency.

Success Story: RLM optimization for R: x584 speed-up

AlphaBetaWorks, a leader in risk management, skill evaluation, and predictive performance analytics, experienced a progressively higher demand on computational resources required by its proprietary algorithms. Its principal, Greg Kapustin, challenged our team to optimize the most time-consuming piece of his algorithm, robust linear regression performed in R.

Most financial asset and portfolio variance is due to a handful of systematic risk factors. Linear models, robust linear models (RLM), and their relatives are at the core of the most predictive financial risk models. R is the powerful tool of statistical analyses which is used in finance for a many tears.

We were surprised when we realized that RLM function from R package was implemented as an external call to Fortran code through C interface. We have redesigned RLM on CPU and GPU platforms and it gives us a significant result. We boost up RLM 584 times! Our CPU version you can get free from CRAN (The Comprehensive R Archive Network). The most important point in this research is finding right configuration of C++ code to optimize transactions between L1 cache and device memory.

These results were presented at the “Official” February 2015 Bay Area R Users Community (BARUG) meeting at Strata+Hadoop World (meetup page for BARUG community is here).

We would be happy to help you utilize our more optimized, GPU-based of RLM function for R, please contact us for more information. We can also optimize an existing R function critical for your algorithm or create an external GPGPU-optimized code to be used in your R workflow.

Embedded high performance computation

To optimize for most intensive image processing, we frequently recommend GPU implementation. For embedded systems, Tegra X1 embedded superprocessor can provide both the processing power for computer vision and image processing, as well as low power consumption.

Our current research in embedded high performance computation is the real time automotive navigation system based on road sign recognition in real time mode. It’s not a challenge task when you try to recognize a pattern in static mode when object and camera placed statically. But in case when car trajectory is unpredictable and often climate conditions are hard (rain, snow, darkness) we have only short time to compute pattern precisely. Currently, we continue our work on applying NVIDIA Tegra X1 to boost up algorithms of pattern recognition. We expect more stable and precise results in next iteration of this project.

Please contact us with details of your needs and we will work with you to make the most of your hardware or help you design a cloud solution.