Chemistry Nobel Awarded for an AI System That Predicts Protein Structures
SOURCE: PHYSICS.APS.ORG
OCT 11, 2024
CMU Researchers Introduce the Open Whisper-Style Speech Model: Advancing Open-Source Solutions for Efficient and Transparent Speech Recognition Training
SOURCE: HTTPS://WWW.MARKTECHPOST.COM/
OCT 03, 2023
Natural language processing (NLP) has paid much attention to large-scale Transformers. These models, trained on large datasets, have demonstrated amazing emergent abilities in various downstream applications. Notably, comparable pre-training methods have been successfully used in voice processing. A promising path to creating universal speech models that can handle many speech tasks inside a single model is large-scale supervised learning. A collection of multilingual, multitask models called OpenAI Whisper [15] was developed using 680k hours of labeled voice data that was carefully selected from various online sources.
The complete process for model building (from data preparation to training) is still unavailable to the general public despite the publication of pre-trained Whisper models and inference code, which has been a usual situation for large language models (LLMs). This restriction raises several issues:
By pushing for the publication of comprehensive training pipelines, there has recently been a determined movement to promote open science in the field of LLM research. This inspired the research team from Carnegie Mellon University, Shanghai Jiao Tong University, and Honda Research Institute to create the Open Whisper-style Speech Model (OWSM)2, which uses an open-source toolbox and publicly available data to replicate whisper-style training. To handle crucial tasks, including language identification (LID), multilingual automated speech recognition (ASR), and utterance-level segmentation, OWSM adopts the Whisper framework.
Notably, OWSM also displays several technical innovations. Instead of just any-to-English translation, it handles any-to-any speech translation. OWSM also uses a variety of tactics to improve efficiency. The entire pipeline, including data preparation, training, inference, and scoring, will be covered by reproducible recipes. The team also plans to make pre-trained models and training logs available, letting researchers dig into the mechanics of the training procedure and obtain important knowledge for their research.
While OWSM performs similarly to Whisper or even better on some metrics, its goal is not to engage in a protracted arms race with Whisper. The team’s largest dataset only makes up about 25% of the training set used by Whisper, and they cannot execute numerous trial runs because of resource constraints.
In the future, the team plans to explore the following directions:
LATEST NEWS
Devices
Here’s how to stop annoying vibrations and notifications on your Samsung device
OCT 12, 2024
WHAT'S TRENDING
Data Science
5 Imaginative Data Science Projects That Can Make Your Portfolio Stand Out
OCT 05, 2022
SOURCE: PHYSICS.APS.ORG
OCT 11, 2024
SOURCE: TECHHQ.COM
OCT 05, 2024
SOURCE: AOL.COM
OCT 06, 2024
SOURCE: NEWS.ARTNET.COM
SEP 27, 2024
SOURCE: LAW.BERKELEY.EDU
SEP 20, 2024
SOURCE: TECHEXPLORIST.COM
SEP 21, 2024