Projects | Akshay Antony

Multimodal Language and Graph Learning of
Adsorption Configuration in Catalysis

In this study, we introduce a novel deep learning approach that combines transformer-based language models and graph neural networks (GNNs) to improve energy prediction in material science. Our method, called graph-assisted pretraining, integrates BERT for processing text information and graph convolution for structural data, creating a multimodal learning framework. By fine-tuning the model, we reduced the mean absolute error (MAE) in energy predictions from 0.71 eV to 0.35 eV, improving accuracy by approximately 10%. This approach enhances model transferability across datasets and shifts focus towards adsorption configurations, aligning with domain knowledge. Additionally, we propose using generative language models to generate text-based inputs for energy predictions, demonstrating a novel application of language models that does not rely on precise atomic coordinates.

Programming Languages Used: Python (Pytorch)

Architecture of pretraining and finetuning

Publication Link

Implementation details

MOFGPT

We developed MOFGPT, a decoder-only transformer model designed to synthesize novel Metal-Organic Frameworks (MOFs) from SMILES strings, achieving a perplexity of 1.2. I incorporated policy gradients (using the REINFORCE algorithm) to guide the generation of MOFs that meet criteria for energy efficiency, validity, and novelty. Additionally, I fine-tuned a reward model for adsorption energy prediction, outperforming several property prediction models.

Programming Languages Used: Python (Pytorch)

Architecture of pretraining and finetuning

Architecture of Reinforcement learning based generation

Weakly Supervised Lidar Object detection

In this project, I constructed a Deep learning model for 3d object detection capable of being trained with only class labels and 3d bounding box proposals produced by density-based clustering, without actual bounding box ground truth labels to acquire an AP (Average Precision) of 29 for car class on the KITTI dataset.

Programming Languages Used: Python (Pytorch)

Ground truth vs prediction of objects

Implementation details

Deep Learning Based Point Cloud Registration

In this project, I modified the Deep Closest Point algorithm by replacing the DGCNN feature extractor module with the Graph Attention module to extract robust local features that decreased the MSE loss of 3D rotation matrix regression by 14%.

Programming Languages Used: Python (Pytorch)

Before vs after registration

Implementation details

Lidar-Camera Autocalibration

In this project, I implemented the SOIC paper, which finds the extrinsic calibration parameters by maximizing the semantic segmentation information between image and point cloud data. This is done by using Powell's Conjugate Optimization technique.

Programming Languages Used: Python (Scipy)

Before vs after calibration

Implementation details

Semantic SLAM in a forest Environment

The goal of the project is to enable the autonomous navigation of an autonomous UAV in a forest environment to allow UAV and UGV robot teams to identify and eliminate potential triggers of forest fires

Implemented Simultaneous localization and mapping algorithm that utilizes semantic information to enable autonomous navigation of drones. The algorithm utilizes the semantically segmented point cloud and recursively estimates the Diameter at breast height (DBH) of surrounding trees by solving a geometric least squares problem and uses this information to localize the drone and also map the environment

Programming Languages Used: Python (Scipy)

The green point cloud represents the data from frame t-1, while the red point cloud corresponds to the data in frame t. The cyan point cloud (which overlaps with the red) is the transformed version of the point cloud from frame t-1, aligned to frame t using the localization obtained through our method.

Implementation details

Monocular Depth Estimation

In this project, we worked in a team of 4 and developed a UNet architecture in Pytorch with VGG-19 and Mobilenet-V2 encoders. We trained the Mobilenet-based model on NYU-V2 depth and KITTI dataset to obtain SSIM (structural similarity index measure) of 0.95 and 0.84 and PSNR (Peak Signal to noise ratio) of 43.14 and 31.2, respectively. Converted model to tensorrt, to improve inference speed by 2.26x.

Programming Languages Used: Python (Pytorch)

Depth prediction results

Implementation details

Lidar Super Resolution

For this project, I developed a UNet-based Super-Resolution model for upsampling the resolution of lidar data from 16 channels to 64 channels. I trained the model on the custom dataset collected from the CARLA simulator. To extract 3D features, I modeled a Pointnet-inspired kernel to extract 3D features from range images and replaced some CNN layers with custom
kernels and depth-wise separable kernels reducing the number of params by 78% with a lower chamfer distance loss.

Softwares Used: Python(Pytorch), ROS, CARLA

Upsampled Result vs Ground truth point cloud

Implementation details