Achutharaman R

CEO, GradsKey

  • National Institute of Technology, Tiruchirappalli
    M.Tech Computer Science



Achutha has a penetrative experience of Leadership roles in Engineering Management, Business Development, Go-to Market Strategy planning & execution. Strong and Calm Leadership qualities. Experienced at different ladder of management.


Dec 2019 - Present
Building a world-class AI based social platform for connecting students & recruiters.

VP, Mobility Platforms & SaaS Cloud
AppiLy Technologies
Nov 2018 - Present
What do we do in APPiLY? -- Awesome Mobile (iOS & Android) Apps & backend Server/Cloud SaaS platform development and architecture, AI/ML/ Augmentation Reality-based Solutions, Database scaling, Server side Apps, Big Data, Business Intelligence, Cloud Solution Services.

President, Engineering (Mobile OS, Embedded Middleware, IoT/Smartcity)
AdiOS Private Limited (Startup)
October 2017 - October 2018
Responsible for the Engineering business development and management - product building (as a owner) and project delivery (as a service). Lead and manage Hardware and Software design & developers spread across Singapore, Chennai, and Bangalore.

Head of System Validation Org
Sep 2011 - Sep 2017

Lead & Managed a SW product organization (size: 30+) - with the Development, Quality test, Build and Release Engineering ownership - for the highly complex and critical product, Validation Test Suite, which touches ~3 billionn US$ worth of Sun/Oracle Hardware (including Exadata DB machine, Exalogic analytics, Super Cluster enterprise servers, Server/Storage HW products;) that get shipped to customers.

Technologies: Processor/Server architecture, Memory/cache/Inter and intra-connect architecture, Disk/Flash technologies, Network technologies, Linux/Solaris Internals/device drives, Virtual domains, Unix/Solaris System programming in C/C++/Java, Scripts: Perl/Python/RESTful APIs/JSON

Senior Engineering Manager
Sun Microsystems
Dec 2006 - Aug 2011

Responsible for the Design & development of the user level Application based engines to verify the HW and SW product performance, availability and stability.

Includes at the SW Level: Linux/Solaris OS kernel level, TCP/IP networking stack, IB stack, File system, and other sub-component verification; at the HW level: Platform Server components.

Head of Customer Technical Support Org, India
Sun Microsystems
Mar 2002 - Nov 2006

Created a 20+ member high performance team from scratch for supporting Sun servers, storage and networking products for APAC region.

Includes at the SW Level: Linux/Solaris OS kernel level, TCP/IP networking stack, IB stack, File system, and other sub-component verification; at the HW level: Platform Server components.

Created SMI wide process for a customized support delivery process for the platinum and gold support customers, which reduced the escalation turn-around-time significantly and increased customer recommendation index.

Managed high profile (banking and telecom) customer data center technologies, and helped them improve their performance significantly, making positive impact in their business. This created new revenue opportunities for sales through reference.

Introduced a innovative concept of 'serviceability into design', created a sustainable process to feed serviceability requirements to the product teams and quality test teams. Over the years, it has resulted in a big positive business impact with Sun India customers and for which it has been recognized well.

Staff Engineer
Sun Microsystems
Jul 1997 - Feb 2002

Architected & developed an innovative on-site production server system diagnostic test solution for high available systems.

Designed & developed on-the fly device drivers/kernel modules instrumentation framework to add on-demand functional enhancements, performance tuning and debugging. With this framework, many customer's performance problems got resolved, and few features were added to the demanding high profile customers, resulted in better customer satisfaction.

Lead the team of engineers for quality platform dependent kernel/OS sustaining.

Managed high profile (banking and telecom) customer data center technologies, and helped them improve their performance significantly, making positive impact in their business. This created new revenue opportunities for sales through reference.

Provided cross-team mentorship and technical training on the new engineering products.


Indian Institute of Science (IISc)
Research in the field of Computer Architecture
2003 - 2005
Done comparative Study and Analysis of Multi-Path Predictions of Java traces and multiple branch prediction

National Institute of Technology, Tiruchirappalli
M.Tech Computer Science

Annamalai University
B.E Electronics & Instrumentation
1989 - 1993

Professional Achievements

3  Publications

Exploiting Java-ILP on a Simultaneous Multi-Trace Instruction Issue (SMTI) Processor
Proceeding IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
The available Instruction Level Parallelism in Java bytecode (Java-ILP) is not readily exploitable due to dependencies involving stack operands. The sequentialization due to stack dependency can be overcome by identifying bytecode-traces, which are sequences of bytecode instructions that when executed leave the operand-stack in the same state as it was at the beginning of the sequence. Instructions from different bytecode-traces have no stack-operand dependency and hence can be executed in parallel on multiple operand-stacks. We propose a simultaneous multi-trace instruction-issue (SMTI) architecture for a processor that can issue instructions from multiple bytecode-traces to exploit Java-ILP. Extraction of bytecode-traces and nested bytecode folding are done in software during the method verification stage. SMTI combined with nested folding resulted in an average ILP speedup of 54% over the base in-order single-issue Java processor, when experimented withSPECjvm98, Scimark and Linpack benchmarks.
See Publication

DSTRIDE: Data-cache miss-address-based stride-prefetching scheme for multimedia processors
Proceeding of the 6th Australasian Computer Systems Architecture Conference (AustCSAC'01)
Prefetching reduces cache miss latency by moving data up in memory hierarchy before they are actually needed. Recent hardware-based stride prefetching techniques mostly rely on the processor pipeline information (e.g. program counter and branch prediction table) for prediction. Continuing developments in processor microarchitecture drastically change core pipeline design and require that existing hardware-based stride prefetching techniques be adapted to the evolving new processor architectures. In this paper we present a new hardware-based stride prefetching technique, called DStride, that is independent of processor pipeline design changes. In this new design, the first-level data cache miss address stream is used for the stride prediction. The miss addresses are separated into load stream and store stream to increase the efficiency of the predictor. They are checked separately against the recent miss address stream to detect the strides. The detected steady strides are maintained in a table that also performs look-ahead stride prefetching when the processor stride reference rate is higher than the prefetch request service rate. We evaluated our design with multimedia workloads using execution-driven simulation with SimpleScalar toolset. Our experiments show that DStride is very effective in reducing overall pipeline stalls due to cache miss latency, especially for stride-intensive applications such as multimedia workloads.
See Publication

PARDISC: A Cost Effective Model for Parallel and Distributed Computing
In the proceedings of Third International Conference on High-Performance Computing (HiPC '96)
A homogeneous system of PCs, workstations, minicomputers, etc. connected together via a local area network or wide area network represents a large pool of computational power. However, in a network of PCs and Workstations transparency is not provided, and hence, users are aware of other machines. PARDISC is a parallel programming environment, which provides the needed transparency as a scalable OpenFrame Computing Model. PARDISC stands for PARallel and DIStributed Computing on homogeneous network. It supports three models of computing by providing the functionalities required to view any homogeneous network as a Loosely Coupled Parallel Computer, or Processor Pool Architecture, or Cluster of Workstations. PARDISC aims at providing a cost effective parallel and distributed programming environment to the academic and R & D institutions since, it employs the existing well established LAN network and models it to support both the paradigms. This paper presents an overview, design and architecture, which discusses how PARDISC can be used to configure the network as loosely coupled parallel machine, processor pool architecture, and distributed computing environment with Logical Network Connectivity. Software Model discusses configuration servers, client processes, and processor pool servers and process communication interface of PARDISC. We end the paper with a description of some issues involved in its implementation on UNIX platform, and porting guidelines and its suitability for parallel programming.
See Publication
Executing Java traces in a hardware JVM
Issued Oct 6, 2009
A processing architecture supports executing instructions in parallel after identifying at least one level of dependency associated with a set of traces within a segment of code. Each trace represents a sequence of logical instructions within the segment of code that can be executed in a corresponding operand stack. Scheduling information is generated based on a dependency order identified among the set of traces. Thus, multiple traces may be scheduled for parallel execution unless a dependency order indicates that a second trace is dependent upon a first trace. In this instance, the first trace is executed prior to the second trace. Trace dependencies may be identified at run-time as well as prior to execution of traces in parallel. Results associated with execution of a trace are stored in a temporary buffer (instead of memory) until after it is known that a data dependency was not detected at run-time.
See Patent

Techniques for Java bytecode execution parallelism
Issued Apr 24, 2007
us 7210127
A system, method and apparatus for executing instructions in parallel identify a set of traces within a segment of code, such as Java bytecode. Each trace represents a sequence of instructions within the segment of code that are execution structure dependent, such as stack dependent. The system, method and apparatus identify a dependency order between traces in the identified set of traces. The dependency order indicates traces that are dependent upon operation of other traces in the segment of code. The system, method and apparatus can then execute traces within the set of traces in parallel and in an execution order that is based on the identified dependency order, such that at least two traces are executed in parallel and such that if the dependency order indicates that a second trace is dependent upon a first trace, the first trace is executed prior to the second trace. This system provides bytecode level parallelism for Java and other applications that utilize execution structure-based architectures and identifies and efficiently eliminates Java bytecode stack dependency.
See Patent