BrushSearch

The ARQMath Lab 2020 and 2021

The ARQMath Lab series, whose name stands for “Answer Retrieval for Questions on Math”, is held in 2020 and 2021 as part of the Conference and Labs of the Evaluation Forum (CLEF) . The Lab series runs the first Mathematical Community Question Answering (MathCQA) task involving real-life math questions selected from the math community question answering forum Math StackExchange, providing an evaluation platform for math-aware search engines.

MathDowsers at the MathCQA Task (Task 1)

With the team name MathDowsers (as a team of researchers from the University of Water-loo who are interested in “dowsing” for answers to math questions), we participated in the MathCQA task in ARQMath-1 and ARQMath-2 with Tangent-L. The MathCQA task asks participating systems to retrieve answers from previous math questions in the community forum that might be potential answers to given math questions.

The MathDowsers team's submissions with Tangent-L achieve the best participant runs in both years in terms of the primary effectiveness measure nDCG' and another popular measure MAP', as shown below.

ARQMath 2020 MathCQA (Task 1) Results Summary

ARQMath 2021 Math CQA (Task 1) Results Summary

MathDowsers at the In-context Formula Retrieval Task (Task 2)

We also joined the in-context formula retrieval task in ARQMath-2, of which the participating systems are asked to retrieve useful formulas in the community forum with respect to an identified formula of the given math questions. The primary submission achieves the best automatic runs, and its performance in nDCG' is almost indistinguishable from the best participant run in the year, as shown below.

ARQMath 2021 In-context Formula Retrieval Task 2 Results Summary

It is shown that MathDowsers' submissions with Tangent-L are particularly strong in finding answers for Formula-dependent questions when compared to other participant systems. Full details of the MathDowsers' submissions are described in the ARQMath Lab Working Paper for ARQMath-1 and ARQMath-2, with the former paper accepted in CLEF 2021 as one of the Best of 2020 Labs. The MathDowsers' Browser allows users to explore the search results of Tangent-L for the MathCQA task for ARQMath-1 and ARQMath-2.

Handwritten Math Generator

In order to implement a mathematical recognition system that uses deep learning and neural networks, a very large number of diverse handwritten expressions is needed for training and testing the recognizer.

Our approach to generating this dataset: (1) Convert typeset expression into a Symbol Layout Tree (SLT), capturing how formula pieces are laid out when printed; (2) Traverse SLT and construct layout based on edge types and symbols relative locations; (3) Query a Unicode font for spatial symbol information (relative sizes and location w.r.t. baseline); (4) Sample normalized handwritten symbols from a dataset and insert into the layout; and (5) Apply local and global distortion models to guarantee the variability of output expressions.