Ph.D. student
Office ISI 1038
Information Sciences Institute (ISI),
University of Southern California
4676 Admiralty Way,
Marina del Rey, CA, 90292
Email: binhlvu@usc.edu or binhvu@isi.edu
I am a Ph.D. student in Computer Science at University of Southern California (USC), advised by Prof. Craig Knoblock. Before joining USC in 2016, I earned my B.E. in Computer Science from HCMC University of Technology (honor program) in 2015.
I am generally interested in machine learning, mainly focusing on techniques for knowledge graph construction. My current research is about semantic modeling for automatically publishing structured data sources to knowledge graphs. More information is available on my resume.
Building semantic descriptions of tables is a vital step in data integration. However, this task is expensive and time-consuming as users often need to examine the table data, its metadata, and ontologies to find the most appropriate description. In this paper, we present SAND, a tool for creating semantic descriptions semi-automatically. SAND makes it easy to integrate with semantic modeling systems to predict or suggest semantic descriptions to the users, as well as to use different knowledge graphs (KGs). Besides its modeling capabilities, SAND is equipped with browsing/querying tools to enable users to explore data in the table and discover how it is often modeled in KGs.
Paper GithubThere are millions of high-quality tables available in Wikipedia. These tables cover many domains and contain useful information. To make use of these tables for data discovery or data integration, we need precise descriptions of the concepts and relationships in the data, known as semantic descriptions. However, creating semantic descriptions is a complex process requiring considerable manual effort and can be error prone. In this paper, we present a novel probabilistic approach for automatically building semantic descriptions of Wikipedia tables. Our approach leverages hyperlinks in a Wikipedia table and existing knowledge in Wikidata to construct a graph of possible relationships in the table and its context, and then it uses collective inference to distinguish genuine and spurious relationships to form the final semantic description. In contrast to existing methods, our solution can handle tables that require complex semantic descriptions of n-ary relations (e.g., the population of a country in a particular year) or implicit contextual values to describe the data accurately. In our empirical evaluation, our approach outperforms state-of-the-art systems on the SemTab2020 dataset and outperforms those systems by as much as 28% in F1 score on a large set of Wikipedia tables.
Paper Github Video
Auto-crawling online sources to retrieve expertise of employees of a company, then predicting the software and hardware used in the company. A list of vulnerabilities is obtained by linking the software and hardware to the CVE database.
Learning Semantic Models of Data Sources Using Probabilistic Graphical Models
Binh Vu, Craig Knoblock, and Jay Pujara
In Proceeding WWW ‘19 The World Wide Web Conference, 2019
LinkPaper
D-REPR: A Language for Describing and Mapping Diversely-Structured Data Sources to RDF
Binh Vu, Jay Pujara, and Craig Knoblock
In Proceeding K-CAP ‘19 Proceedings of the Knowledge Capture Conference, 2019
LinkPaperSlides
A Graph-based Approach for Inferring Semantic Descriptions of Wikipedia Tables
Binh Vu, Craig Knoblock, Pedro Szekely, Minh Pham and Jay Pujara
In Proceeding ISWC ‘21 International Semantic Web Conference, 2021
LinkPaperVideoSlides
SAND: A Tool for Creating Semantic Descriptions of Tabular Sources
Binh Vu and Craig Knoblock
Demo at ESWC ‘22 European Semantic Web Conference, 2022
Paper