AUG 10, 2022
The Best of arXiv.org for Artificial Intelligence, Machine Learning, and Deep Learning
DEC 20, 2021
We filter recent research papers appearing on the arXiv.org preprint server for compelling subjects relating to AI, machine learning, and deep learning – from disciplines such as statistics, mathematics, and computer science – and provide you with a useful “best of” list for the previous month in this recurring monthly feature. Researchers from all around the globe contribute to this repository as a preliminary to peer review for traditional journal publishing.
Consider that these are academic research papers, typically geared toward graduate students, post docs, and seasoned professionals. They generally contain a high degree of mathematics so be prepared. Enjoy!
The arXiv is a true gold mine of statistical learning approaches that you could employ one day to solve data science difficulties. The publications mentioned below are only a small portion of the total number of articles available on the preprint server. They are included in no particular sequence and include a link to each document as well as a brief summary. When available, links to GitHub repositories are provided. A “thumbs up” indicator is shown next to items that are particularly relevant.
Many existing neural architecture search (NAS) solutions rely on downstream training for architecture evaluation, which takes enormous computations. Considering that these computations bring a large carbon footprint, this paper aims to explore a green (namely environmental-friendly) NAS solution that evaluates architectures without training. Intuitively, gradients, induced by the architecture itself, directly decide the convergence and generalization results.
A new kernel based architecture search approach KNAS was proposed. Experiments show that KNAS achieves competitive results with orders of magnitude faster than “train-then-test” paradigms on image classification tasks. Furthermore, the extremely low search cost enables its wide applications. The searched network also outperforms strong baseline RoBERTA-large on two text classification tasks.
This paper proposes the gradient kernel hypothesis: Gradients can be used as a coarse-grained proxy of downstream training to evaluate random-initialized networks. To support the hypothesis, a theoretical analysis was conducted to find a practical gradient kernel that has good correlations with training loss and validation performance.
Natural Language Processing