Why you’re probably wasting 40% of your time on data preparation – and how to reclaim it
Spending 40% of your time on data cleanup? See why it happens—and how better tools can give you time back for real discovery.
Laura Higgins
September 9, 2025
5 min. read
Scientists and engineers spend a staggering 40% or more of their time on data cleanup and preparation. Not only is this often a frustrating task, it's a major bottleneck that delays innovation. This post will demystify why this happens and show you how to fix it with the right tools.
Why is data preparation so time-consuming?
Data preparation goes far beyond fixing a few simple errors or inconsistencies. It involves moving from raw data to cleaned data, and, ultimately, to analysis-ready data that supports sound decisions. Why does it take so long?
- It’s multistage: Data cleanup isn’t a single task – it involves gathering, combining, cleaning, and transforming data.
- It’s nonlinear: You often need to revisit earlier steps as new issues or insights emerge.
- Goals evolve: Initial analytical objectives can shift as you explore the data, requiring additional adjustments.
Even seemingly simple tasks – like cleaning, calculating new variables, or deriving insights – can consume far more time than expected, particularly when relying on tools that aren’t optimized for these workflows.
Why spreadsheets can limit your data prep workflow
Spreadsheets, most commonly Excel, are familiar to almost everyone. But for the complex process of turning raw data into information suitable for decision making, they can be a significant limitation. Excel is not ideal as a fast, systematic, and easily repeatable tool for advanced analytics.
Here are several ways Excel can impede your data preparation efforts:
- Complicated process development: Building a repeatable process in Excel is time-consuming and often leads to mistakes. It is difficult to maintain a robust, error-free workflow.
- Fragmented data management: When data is pulled from external databases into Excel, you can end up with disconnected spreadsheets. This approach is confusing and error-prone.
- Limited visual interaction for cleaning: Human beings are visual creatures. While Excel allows for conditional formatting, it doesn’t provide the interactive visual summaries needed for iterative cleanup and exploration.
- Scalability and overview issues: Large data sets can quickly exceed the limits of a single screen, making a holistic view impossible. A visual summary becomes essential for insights and informed decisions.
Despite their familiarity, spreadsheets are not the best choice for comprehensive data preparation in scientific and engineering workflows.
Should you use code for data preparation?
When choosing analytical tools, it helps to consider whether a task is a one-time project or part of a recurring process. For one-off tasks, people often fall back on familiar tools like spreadsheets (a choice not without pitfalls, as we discussed above). For recurring tasks, many turn to coding, assuming it will provide a reusable solution to the challenges of data preparation. While programming can be powerful, it also adds complexity.
Here's why coding may not always be the best answer:
- Disconnect from decision making: Coding may require specialists who aren’t sufficiently familiar with the data or the specific analytical needs for optimal decision making.
- Lack of visual feedback: Coding often lacks the visual, interactive cues needed to reveal the state of the data, making it hard to see what has been achieved in terms of data quality.
- Handoff challenges: While coding provides some level of process documentation, it can create problems if ownership changes. A new team member may struggle to interpret or adapt someone else's code.
While coding is powerful, it can create gaps – distancing users from the data, obscuring quality issues, and complicating handoffs. Interactive tools that blend cleanup with exploration provide a faster, clearer path to reliable results.
Which other data cleaning and calculation tasks often slow down scientists and engineers?
Beyond the challenges of tools and workflows, the sheer variety of tasks in data preparation also eats up time. Data preparation is about more than tidying – it often involves creating new variables and transformations that make data usable. These tasks should be quick and repeatable, especially since new requirements often arise on the fly during exploration.
Here are examples of crucial data cleaning and calculation tasks that can be particularly time-intensive:
- Time-based and numerical derivations: Creating new values from timestamps (e.g., durations) or from numerical combinations such as percentages, standardizations, or other mathematical operations.
- Text and categorical transformations: Extracting strings, combining text, or collapsing complex categories into simpler groups for easier analysis.
- Binning and classification: Turning continuous values into discrete categories (e.g., converting a range of numbers into “pass/fail” or “yes/no” indicators).
- Summaries and advanced calculations: Generating descriptive statistics (means, standard deviations) or more complex values like derivatives or powers.
Beyond these, messy data brings harder questions:
- Outlier management: How should you handle data points that fall far outside expectations? Keeping or discarding them can both introduce bias. Robust tools offer flexible ways to assess and manage outliers.
- Missing data: Should missing values be ignored, filled in, or treated as meaningful in themselves? Flexible tools let you address missing data in ways that align with your analysis goals.
To derive maximum value from their data, scientists and engineers must be able to efficiently perform creative transformations, derive new variables, and make critical decisions about which data to include, exclude, or adjust.
Reclaim your time and accelerate discovery
Much of the time lost to data preparation stems from relying on tools and processes that aren’t built for the job. The solution is a platform that integrates all stages of data prep with analysis, supports iterative workflows, and provides visual interaction with systematic documentation.
Look for a platform that can:
- Combine all stages of data prep with exploratory data analysis (EDA). Seamlessly clean, transform, and calculate within an interactive environment.
- Support iterative, nonlinear workflows. Easily adjust and backtrack as new insights emerge.
- Provide strong visual interaction. Quickly spot and resolve data quality issues.
- Document and standardize your workflow. Make your work easy to revisit, share, and scale for future projects.
By meeting these needs, you can turn data preparation from a frustrating bottleneck into a streamlined, even satisfying, part of the discovery process – freeing up more time for true innovation.
Want to reclaim more of your time?
Check out this white paper to see how JMP can help end your data-wrangling struggle.