A Data Scientist’s Guide to Saving Time


SOURCE: DATANAMI.COM
MAR 14, 2022

To celebrate and showcase the work of the data science community, Z by HP recently released Unlocked,” an interactive 35-minute film that weaves data science challenges into an action-packed story. In the film, Dr. Eva Ramirez is searching for a mysterious flower rumored to cure the rare neurological disease that plagues her family. However she’s not the only one looking for this elusive plant — a local cartel is dead set on securing it for themselves. The film and companion website, www.hp.com/unlocked, present data scientists with the opportunity to participate in a series of problem-solving challenges while showcasing the value of data science to non-technical stakeholders with a compelling narrative.

Though the average data scientist’s day-to-day responsibilities may not include outsmarting a cartel, you stand at the crossroads of science, engineering, business intelligence, and mathematics. You have a lot to juggle — everything from coding and data visualization to cleaning up datasets and reacting to ad-hoc day-to-day responsibilities, and unless you strictly track your time, you might be surprised how you spend it. In a recent survey conducted by HP of 350 data scientists worldwide, 48% claimed they spent more time organizing their data than actually analyzing it[1].

Your time is valuable, so efficiencies are essential tools of the trade. Here are several tips for optimizing your workflow and making the best use of your time.

Save time with proactive communication

Establishing communication touchpoints to ensure you’re making the right decisions throughout the project is essential to saving time. Forty percent of surveyed data scientists mention that they often start working with data before fully understanding the business objectives[2]. This lack of communication often leads to managers having unrealistic expectations about the project’s outcome.

No matter how technical the topic, it’s important to communicate in a way that makes sense to the people who have to implement it. A critical distinction: Business stakeholders tend to think in binary outcomes, while data science is painted in shades of uncertainty. Aligning your data-driven approach with that of your stakeholder’s is essential.

Save time by getting to know your data upfront

Not only is your time precious, but it’s often split between projects, both long-term and short-term. That means making the best use of your time is paramount and optimizing the time at the start of any project can pay enormous dividends later.

One common mistake that can cost you time downstream is to start the modeling phase too soon, before you really understand your data. When you start to work on a new project, you’re no doubt eager to start modeling, but experienced data scientists know better. By dedicating time, even as much as a day or two, it’s possible to discover patterns that’ll help inform your model. In the end, that’s actually a huge time gain.

Likewise, a huge part of any data science project is documentation. Getting the appropriate documentation locked down early is a critical way to improve efficiency and save time. Don’t neglect properly documenting your own code as you go. Poorly documented code is a bad habit common to both software engineers and data scientists alike, so never assume you’ll remember what you intended even a week later, much less a month. Spending time on documentation means you won’t have to decipher your code every time that you go back to it.

Save time with the right tools, configurations, and accessories for your workspace

Configuring a new computer is always a challenge and when surveyed, 42% of data scientists lament they spend too much time configuring their data environment, with an average of five hours per week lost[3]. The most fundamental time-saver for data scientists will always be determined by the power of their workstation. That’s why Z by HP is constantly innovating to bring data scientists the high-compute workstations, displays, and tools they need to manage their tasks as seamlessly as possible.

One significant way to improve your efficiency is by adopting Windows Subsystem for Linux. WSL 2[4] lets you virtually run Linux tools, utilities, and applications directly within Windows without resorting to a dual-boot configuration or virtual machine — an improvement that reduces friction and speeds up your workflow.

Likewise, preconfigured software stacks can be nothing short of a revelation for data scientists. The Z by HP Data Science Software Stack is essentially a comprehensive suite of applications and environments — everything pre-loaded with automatic updates, avoiding the inevitable software incompatibilities and troubleshooting time that plague routine setups.

It’s also simple things that make a big difference day to day. A lot of accessory choices are personal decisions, driven by comfort, convenience and preference. Adding the right accessories – everything from the right mouse to curved displays – can help you optimize your workflow and save time.

Save time with workflow automations

In the same way you spend your days optimizing models, there are opportunities to optimize your daily routines for better productivity. Depending on where you are in your data science career, you likely already optimize your workflow in some ways. However, it’s important to be mindful of additional ways to optimize your time with increasing sophistication as your skills and experience grow.

A more sophisticated tool for your toolkit? Automation. Any data scientist with some experience in the rear-view mirror knows the value in automating their workflow. After all, it doesn’t take long to see that some tasks require a lot of manual processing and may need to be done again and again, so automating those tasks can save enormous amounts of time. Just a few commands can enable you to run an entire process autonomously.

As evidenced by all the above tips, there are so many habits you can adopt that will make the work you do as a data scientist more efficient. Even a few small steps can make the time savings add up.

[1] HP proprietary research: Understanding Data Scientists, November 2021.
[2] HP proprietary research: Understanding Data Scientists, November 2021.
[3] HP proprietary research: Understanding Data Scientists, November 2021.
[4] WSL 2 requires Windows 10 or higher, Intel Core i5 processor or higher and is available on select Z workstations. You must be running Windows 10 version 21H2 and higher (Build 19044 and higher) or Windows 11.

Similar articles you can read