Gradients on the Clock

Application Status

This project is now permanently shelved due to the lack of participation for the Techlauncher project.

What is this project?

This is a pioneering venture at the intersection of hardware description language programming, optimization, and deep learning. This project aims to enhance the capabilities of automatic differentiation engines critical to deep learning on microprocessors under nanoseconds. Our theme centers around Robotic Learning, where we aspire to create robots that can learn and adapt on the go.

Unlike traditional project structures, this initiative follows an inverted format where the team drives decisions and sets targets, fostering a collaborative and dynamic research environment.

Whether you're interested in robotics, AI, hardware design, or software development, this project offers a unique opportunity to contribute to community development. This project is designed for a 7-member Undergraduate +/ Masters team.

Motivation

In the rapidly evolving fields of robotics and generative AI, deploying advanced models often relies on deep learning to handle a diverse range of tasks. These tasks span from basic object detection to sophisticated challenges such as predicting the next best view in a 3D scene. Central to the success of these models is the use of automatic differentiation engines, which compute the derivatives of mathematical functions. These computations, whether simple or complex, are fundamental to deep learning training and are executed millions of times, making their speed and efficiency crucial.

Our project aims to discover and implement innovative methods, techniques, and algorithms that enhance the efficiency and effectiveness of these computational processes. In artificial intelligence, improving efficiency often translates to cost-effectiveness. Thus, our research targets one of AI's most pressing concerns: sustainable and efficient deployment.

Algorithms, despite their inherent complexities, must be computationally practical to be viable. In AI, the importance of speed goes beyond theoretical performance, emphasizing real-time execution with wall-clock time as the ultimate measure. The vast and dynamic nature of this field means that even small, novel improvements can have a significant impact on both practitioners and researchers in deep learning, driving meaningful advancements in the domain.

By addressing these objectives, we aim to make significant contributions to the efficiency of deep learning models, ultimately benefiting the broader AI and robotics communities.

About the Project

Gradients on the Clock embarks on the essential groundwork and prototype development phase, aimed at creating a cutting-edge computational framework(s). This project uniquely integrates microcontrollers, Field-Programmable Gate Arrays (FPGAs), and GPUs/CPUs on a standard machine to facilitate core operations in gradient descent, crucial for the process of automatic differentiation. The core objective is to track the time, energy, and FLOps necessary for performing basic gradient descent on microprocessors and find the computationally cheapest means of achieving online learning on an FPGA with a microprocessing interface. This system is designed to bridge the gap between theoretical AI concepts and practical hardware applications, providing an educational foundation in the integral aspects of hardware and software integration for AI and specifically TinyML.

Project Goals

System Architecture Design: The project begins with the meticulous selection and integration of high-performance microcontrollers and FPGA boards. This stage covers extensive educational content on microcontroller programming and FPGA customization. The integration also includes setting up robust communication protocols between these components and PCs to ensure seamless data flow and processing capabilities.
Advanced Gradient Computation Techniques: We are setting up to implement and refine optimization algorithms that operate under strict computational and memory constraints. This exploration includes the creation of optimized interfaces with both local and cloud storage solutions, prominently featuring platforms such as Google Cloud and AWS. The aim is to facilitate scalable and efficient cloud computation models that integrate smoothly with our hardware setup.
Algorithm Benchmarking and Optimization: Detailed testing of gradient descent and automatic differentiation operations is conducted to enhance their performance and reliability. This phase involves the use of sophisticated analytical tools like TensorFlow and PyTorch, enabling us to measure and optimize the algorithms' functionality across different hardware setups.
Academic Integration and Global Collaboration: This project is not only about technical development but also about creating a global collaborative environment. We continually engage with current research, and maintain an interactive, up-to-date project website to document and share our findings. This helps participants develop their skills in digital communication and web development, using comprehensive resources like Google Scholar for literature reviews on the cutting edge of this field.

Possible Learning Outcomes

Participants will gain a multi-dimensional educational experience, achieving the following enhanced learning outcomes:

Develop deep technical skills in programming microcontrollers and FPGAs, crucial for creating embedded systems that integrate AI capabilities.
Design and implement energy-efficient hardware setups with a focus on scalability and long-term operation.
Master the art of performance analysis and algorithm optimization across various hardware platforms, ensuring optimal efficiency and effectiveness.
Advance their research and technical writing skills through regular updates and contributions to the project's digital platforms.
Learn to configure and utilize major machine learning libraries, adapting them to run efficiently on microcontroller and FPGA based environments.
Engage with a global community of researchers and developers through collaborative projects and forums, enhancing professional networking and knowledge exchange.

Motivating References

Baydin et al., Automatic Differentiation in Machine Learning: A Survey. https://doi.org/10.48550/arXiv.1502.05767
Piercy and Steur in the European Journal of Operations Research, Reducing wall-clock time for the computation of all efficient extreme points in multiple objective linear programming.
Metz et al., Learned optimizers that outperform on wall-clock and validation loss, ICLR 2019.
Goodfellow, Bengio, and Courville, Deep Learning, MIT Press, widely regarded as a seminal text in deep learning covering a range of topics including neural networks and hardware optimizations. http://www.deeplearningbook.org/
Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, a fundamental read on optimization techniques applicable to hardware-accelerated systems. https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

In case I have overwhelmed you with details, relax, this project will be performed in phases dependending on the feasibility and teams' developing expertise. Note that I will be providing a Basys3 FPGA board and Arduino MEGA 2560 microprocessor. I will also be providing for additional components as required.

Project Opportunities

Join a student-led Open Source Project at the forefront of AI and hardware interaction. This is a unique opportunity to find and potentially publish novel insights and contribute to transformative open-source AI research. By participating, you'll gain hands-on experience in hardware engineering, deep learning applications, robotics, and the mathematics of optimization. You'll enhance your technical skills and collaborate on a public platform that bridges the academic and industrial spheres.

Participant Requirements

We're seeking applicants with a robust foundation in university-level calculus and proficiency in Python. While prior knowledge of microprocessor programming and software development tools like Xilinx Vivado and Arduino is advantageous, it's not mandatory. Familiarity with web development (HTML, JavaScript, CSS) will also be helpful for contributing to our project documentation. Be prepared to acquire a diverse set of skills over the course of two semesters.

Additional Requirements: Candidates must demonstrate strong analytical thinking and problem-solving skills and be self-sufficient in terms of learning new concepts. Experience with any machine learning frameworks like PyTorch is highly desirable but not required. Commitment to collaborative development and an eagerness to learn new technologies are crucial. Applicants should also be ready to engage with both theoretical and practical aspects of the project, including participating in discussions, writing reports, and implementing prototypes.

Join Us

This project is open to ANU School of Computing students as part of the Techlauncher program for the S2 2024 intake. We're actively seeking motivated students to join the inaugural team. This project is listed in the ANU Techlauncher redmine account. For more information, please contact me.