Laxman Singh Tomar

Laxman Singh Tomar

Senior NLP Engineer in Bangalore, India, He/him

About

Full Stack NLP Engineer @ Emplay Inc.

Work Experience

2022 — Now
Remote/Bangalore, India

Building next-gen conversational search and recommender systems.

2020 — 2022
Remote/Pune, India

Worked at the intersection of Cybersecurity and Machine Learning to build products around Content Moderation and Network Anamoly Detection.

2019 — 2020
Remote/Gurugram, India

Worked on building Voicenet- a speech recognition library aimed to help developers build various voice-based applications such as age and emotion detection from speech samples.

Projects

2023
Conversational Search Engine at Emplay Inc.

Currently integrating generative capabilities of LLMs like GPT-3 & ChatGPT into products like Search, Generation, Information Retrieval, and multi-purpose Agents for SAP and P&G via LLM Stack including LangChain.

2022
Content Moderation Engine at Emplay Inc.

Developed an AI-powered content moderation engine for Simpplr, utilizing a dataset comprising ~2M comments. Trained a Mini-LM Model to detect racial, religious hate, insulting, and explicit comments with ~91% accuracy. Utilized ONNX Conversion and Dynamic Quantization to optimize the solution for speed and storage space. Served in production with FastAPI and Docker and actively used by 500+ companies.

2022
Knowledge Search Engine at Emplay Inc.

Architected a scalable microservice-based system to build and search a knowledge index with over 1 million documents for P&G, supporting multiple tenants and controlled hierarchy via configs. Utilized Docker and Kubernetes for orchestration and OpenAPI standard for REST API design.

2022
Question-Answer Generation at Emplay Inc.

Built a pipeline for generating high-quality question-answer pairs from indexed text documents, using T5 and Roberta models fine-tuned on the SQUAD dataset. Designed and implemented asynchronous request handling with Celery and RabbitMQ and deployed it as REST APIs using FastAPI in Docker and Kubernetes. Developed a monitoring dashboard and a testing suite for unit and stress testing with Pytest and Locust.

2022
Question-Answer Evaluation at Emplay Inc.

Annotated a dataset of low and good-quality QA pairs, and adopted techniques like Weak Supervision and Active Labeling to improve annotation efficiency. Generated features attributing to question presence, grammatical question structure, and text readability. Developed a Random Forest Classifier with ~75% accuracy to filter out low-quality questions, with data and experiment tracking by DVC and MLFlow. Developed an evaluation suite to identify low-confidence samples via CleanLab and key slices via Snorkel/Sliceline.

Side Projects

2022
In-Session Purchase Intent Predictions & Recommendations

RecSys for Cart Abandonment and Recommendations. Used SIGIR Ecom 2021 Challenge Dataset. Used Prefect as Orchestrator and Metaflow for ML DAGs. Built an LSTM Model with TF/Keras and deployed with Serverless.

2022
Tags Prediction for Projects

Multi-Class Text Classification using Project Metadata. Used Dataset comprising Projects Title, Description, and associated categories/tags. Used Airbyte for Data Pipelines, Airflow as Orchestrator, and Feast for Feature Store. Built a Model using Tf-Idf Feature Vectorization and SGD Classifier to obtain a 0.85 f-1 score.

2022
Podcast Search at Spotify

Replicating Semantic Podcast Search at Spotify. Used Listen Notes' Dataset comprising metadata of 100k Podcasts. Obtained Recall@30 of 0.57 compared to 0.29 via fine-tuned distilUSE when using the same model without fine-tuning.

LinkedIn
Twitter