Ph.D. student
Office ISI 1038
Information Sciences Institute (ISI), University of Southern California
4676 Admiralty Way, Marina del Rey, CA, 90292
Email: binhlvu@usc.edu or binhvu@isi.edu

Software

SAND: A Tool for Creating Semantic Descriptions of Structured Sources

A tool for creating semantic descriptions semi-automatically. SAND makes it easy to integrate with semantic modeling systems to predict or suggest semantic descriptions to the users, as well as to use different knowledge graphs (KGs). Besides its modeling capabilities, SAND is equipped with browsing/querying tools to enable users to explore data in the table and discover how it is often modeled in KGs.

Paper Github
Dataset Representation Language for Reading Heterogeneous Datasets to RDF or JSON

  • Reading public datasets is a laborious task and frequently requires to write custom code because data are often stored in many different formats (CSV, JSON, Spreadsheet, NetCDF, etc) with different layouts (row-based, matrix, hierarchy).
  • To address the problem, we create D-REPR, a language to represent heterogeneous datasets, and a very efficient D-REPR processor to read the datasets from their own formats to a common representation.

Paper Github
KGData

Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)

Github
Hugedict

A drop-in replacement for dictionary objects that are too big to fit in memory

Github
Rsoup

A very fast library for web scraper that handles text correctly

Github
Ream

A simple actor architecture for research software

Github
PBT

A Python build tool for multi-projects

Github
Gena

Framework to help to build (web) application faster.

Github
YADA

Yet Another Dataclass Argument Parser

Github
Streamlit Bridge

Streamlit components that allow client side (javascript) to send data to the server side (python) and render HTML content without being processed by Markdown.

Github
Timer

Timing Python code made easy

Github