Multimodal Language and Graph Learning of
Adsorption Configuration in Catalysis
​
In this study, we introduce a novel deep learning approach that combines transformer-based language models and graph neural networks (GNNs) to improve energy prediction in material science. Our method, called graph-assisted pretraining, integrates BERT for processing text information and graph convolution for structural data, creating a multimodal learning framework. By fine-tuning the model, we reduced the mean absolute error (MAE) in energy predictions from 0.71 eV to 0.35 eV, improving accuracy by approximately 10%. This approach enhances model transferability across datasets and shifts focus towards adsorption configurations, aligning with domain knowledge. Additionally, we propose using generative language models to generate text-based inputs for energy predictions, demonstrating a novel application of language models that does not rely on precise atomic coordinates.
Programming Languages Used: Python (Pytorch)
Architecture of pretraining and finetuning
MOFGPT
​
We developed MOFGPT, a decoder-only transformer model designed to synthesize novel Metal-Organic Frameworks (MOFs) from SMILES strings, achieving a perplexity of 1.2. I incorporated policy gradients (using the REINFORCE algorithm) to guide the generation of MOFs that meet criteria for energy efficiency, validity, and novelty. Additionally, I fine-tuned a reward model for adsorption energy prediction, outperforming several property prediction models.
Programming Languages Used: Python (Pytorch)
Architecture of pretraining and finetuning
Architecture of Reinforcement learning based generation
Weakly Supervised Lidar Object detection
In this project, I constructed a Deep learning model for 3d object detection capable of being trained with only class labels and 3d bounding box proposals produced by density-based clustering, without actual bounding box ground truth labels to acquire an AP (Average Precision) of 29 for car class on the KITTI dataset.
Programming Languages Used: Python (Pytorch)
Ground truth vs prediction of objects
Deep Learning Based Point Cloud Registration
In this project, I modified the Deep Closest Point algorithm by replacing the DGCNN feature extractor module with the Graph Attention module to extract robust local features that decreased the MSE loss of 3D rotation matrix regression by 14%.
Programming Languages Used: Python (Pytorch)
Before vs after registration
Lidar-Camera Autocalibration
In this project, I implemented the SOIC paper, which finds the extrinsic calibration parameters by maximizing the semantic segmentation information between image and point cloud data. This is done by using Powell's Conjugate Optimization technique.
Programming Languages Used: Python (Scipy)
Before vs after calibration
Semantic SLAM in a forest Environment
The goal of the project is to enable the autonomous navigation of an autonomous UAV in a forest environment to allow UAV and UGV robot teams to identify and eliminate potential triggers of forest fires
Implemented Simultaneous localization and mapping algorithm that utilizes semantic information to enable autonomous navigation of drones. The algorithm utilizes the semantically segmented point cloud and recursively estimates the Diameter at breast height (DBH) of surrounding trees by solving a geometric least squares problem and uses this information to localize the drone and also map the environment
Programming Languages Used: Python (Scipy)
The green point cloud represents the data from frame t-1, while the red point cloud corresponds to the data in frame t. The cyan point cloud (which overlaps with the red) is the transformed version of the point cloud from frame t-1, aligned to frame t using the localization obtained through our method.
Monocular Depth Estimation
In this project, we worked in a team of 4 and developed a UNet architecture in Pytorch with VGG-19 and Mobilenet-V2 encoders. We trained the Mobilenet-based model on NYU-V2 depth and KITTI dataset to obtain SSIM (structural similarity index measure) of 0.95 and 0.84 and PSNR (Peak Signal to noise ratio) of 43.14 and 31.2, respectively. Converted model to tensorrt, to improve inference speed by 2.26x.
Programming Languages Used: Python (Pytorch)
Depth prediction results
Lidar Super Resolution
For this project, I developed a UNet-based Super-Resolution model for upsampling the resolution of lidar data from 16 channels to 64 channels. I trained the model on the custom dataset collected from the CARLA simulator. To extract 3D features, I modeled a Pointnet-inspired kernel to extract 3D features from range images and replaced some CNN layers with custom
kernels and depth-wise separable kernels reducing the number of params by 78% with a lower chamfer distance loss.
Softwares Used: Python(Pytorch), ROS, CARLA
​
Upsampled Result vs Ground truth point cloud