11_sharing.qmd

---
title: "Sharing and Archiving Qualitative Data"
editor: 
  markdown: 
    wrap: 72
---

## Why Sharing Qualitative Data?

Sharing qualitative data benefits both the scholarly community and
researchers in several ways:

1.  Fostering Public Trust: Transparency enhances public confidence in
    research outcomes, vital for securing funding and support for future
    projects. It allows for verification of claims, reinforcing trust in
    the research.

2.  Dynamic Research Environment: While qualitative research invites
    diverse interpretations, sharing data fosters improved research
    quality through collaborative critique and examination.

3.  Enabling New Research: Access to shared data inspires innovative
    analyses, maximizing the scientific value of existing studies.

4.  More Effective Use of Resources: Data sharing reduces costs related
    to new data collection, promoting efficient resource utilization,
    and minimizing the burden on frequently targeted communities.

5.  Skill Development for Trainees: It offers students valuable
    opportunities to learn coding and analysis techniques, enhancing
    their educational experience.

6.  Receiving Credit: Sharing data ensures proper attribution, allowing
    researchers to gain recognition for their work.

7.  Opportunities for Collaboration: Open data fosters partnerships
    among researchers, leading to new insights and advancements.

## Sharing with Caring

When sharing data, researchers should make their best effort to provide
complete and good quality documentation to support reuse.

Before we dive into what researchers should share and where. Let's
explore something together.

::: {.callout-note collapse="true" icon="false"}
# 💭 **Discussion:** Comparing Data Deposits

Please open the links to the two data deposits below:

Taherzadeh, O., 2016, "Interview Transcripts", *Interview Transcripts*,
<https://doi.org/10.7910/DVN/4C9KFK/XRREIY>, Harvard Dataverse, V1

Klein, M., 2022. *Interview transcripts of addiction therapists and
recovering drug service users.* Bath: University of Bath Research Data
Archive. Available from: <https://doi.org/10.15125/BATH-01096>{target='_blank'}.

Can you spot any differences? Supposing those were both topics related
to your research, how likely would you be to reuse one dataset versus
another? Why?

::: {collapse="true"}
**Context and Documentation**

-   Taherzadeh (2016): This deposit lacks detailed contextual
    information about the study, such as the sample, interview
    questions, study goals, or informed consent details. It is a
    standalone collection of transcripts.

-   Klein (2022): This deposit provides clearer context, the objectives
    of the research and questions asked, and links to the associated
    dissertation.

**Reuse Value**

Taherzadeh (2016) Dataset: Low

-   The absence of context and supporting documentation makes it
    challenging to assess the dataset’s validity, reliability, and
    relevance to other research. Without knowing the background or how
    the data was collected, it's difficult to justify its use in further
    studies.

Klein (2022): Higher

-   The dataset seems to come with comprehensive documentation,
    including context about the participants and the study goals. This
    information facilitates a better understanding of how to apply the
    data effectively in new research, making it much more reusable.
:::
:::

## Considerations on What to Share

Remember when we discussed the importance of outlining data-sharing
plans in Data Management Plans (DMPs)? At this stage, Sarah could
greatly benefit from having a clear strategy for archiving and storing
her data. As we discussed earlier, understanding the available options
and having at least a rough plan for what will be shared, along with
strategies to facilitate the process, is very important. We provided
Sarah with recommendations on what to document, and we hope this
guidance will empower her to share her research deliverables confidently
while adhering to key principles of open practices.

Also, it is important to recap the importance of balancing the value of
open sharing against the risks of harm associated with the
identification of participants, communities, and research sites. The
good news is that there are more options in between data being closed
and open!

Depending on your project needs and what was agreed in the informed
consent, we recommend you to consider evaluating access control options,
they will help you determine which data repository will be most suitable
for storing and preserving your project data.

### Access Control Questions

Access controls fall into three main categories:

-   *Who* can access your data? Access may be limited to qualified
    researchers, often requiring proof of interest through research
    proposal, or it may require pre-approval from an Institutional
    Review Board (IRB) for general requests.

-   *How* can others access your data? Secure internet connections,
    along with agreements regarding data storage and destruction, might
    be required for downloading data. Researchers may sometimes need to
    access data in person on a secure, offline computer. Hybrid
    solutions, like ICPSR’s “virtual enclave,” allow remote viewing
    without data leaving the server.

-   *When* can others access your data? Embargoes can temporarily
    restrict access to protect human participants, often allowing
    researchers to publish findings before broader access. These
    embargoes can also facilitate long-term data availability, with set
    dates for lifting restrictions, as seen in historical archives.

### Sharing Levels

-   Openly available: data (typically de-identified) shared with no
    restrictions.

Example: Cunningham, Una; De Brún, Aoife; Mayumi, Willgerodt et al.
(2021). Appendices interview formats \[Dataset\]. Dryad.
<https://doi.org/10.5061/dryad.q83bk3jg8>{target='_blank'}

-   Subject to Embargo: a temporary restriction on sharing or publishing
    data. It means that the data can’t be made public for a set period,
    usually to protect sensitive information allow for further analysis,
    or wait for a specific event, such as a formal publication before
    releasing it.

Example: Ibitoye, Mobolaji; OlaOlorun, Funmilola; Casterline, John B..
2025. "Demand for Modern Contraception in Sub-Saharan Africa: New
Methods, New Evidence". Qualitative Data Repository.
<https://doi.org/10.5064/F600CMLO>{target='_blank'}. QDR Main Collection. V1

-   Closed Access/Metadata Record Only (sensitive data/no consent): a
    summary and description of a dataset without containing the actual
    data itself that provides essential information about the dataset's
    provenance, structure, and context.

Depending on the research case, access can be provided through a Data
Use Agreement (DUA) and involve a data enclave for safe access. These
requirements will also depend on IRB and consent form agreements.

-   Data Use Agreement (DUA) required: a contract that outlines the
    terms and conditions for a recipient to use data from a data owner.
    It's specific to a project or study and can include limitations on
    use, data safeguarding obligations, and privacy rights. Some
    supplementary files (i.e., codebooks, data collection instrument,
    selected processed data to reproduce specific figures or support
    some findings).

Example: Steeves, Vicky; Peltzman, Shira; Kim, Julia; Griesinger, Peggy;
Blumenthal, Karl-Rainer. 2020. "Data for: "What’s Wrong with Digital
Stewardship: Evaluating the Organization of Digital Preservation
Programs from Practitioners’ Perspectives". Qualitative Data Repository.
<https://doi.org/10.5064/F6DJRPLK>{target='_blank'}.

::: {.callout-note collapse="true" icon="false"}
# 💭 **Discussion:** What is the value of sharing a metadata record only?

A metadata-only record for research data that isn't openly available
enables readers to evaluate whether they want to request access quickly.
While a well-crafted Data Availability Statement in journal papers
serves a similar purpose, a metadata-only record in a suitable
repository offers the benefit of being discoverable through data-focused
searches, along with the ability to provide more detailed descriptions
through rich, linked, and interoperable metadata.
:::

### A Note About DAS

Data Availability Statements (DAS) are crucial for the credibility of
manuscripts and other published research. They provide interested
readers—and sometimes automated algorithms—access to the underlying data
supporting your claims, allowing them to verify those assertions or use
the data for further research. We suggest following some best practices
for crafting statements that are both effective and clear while also
complying with funders' and journal policies' requirements.

<iframe width="50%" height="800" src="https://rcd.ucsb.edu/sites/default/files/2024-02/DLS-202402-DataAvailability_navy.pdf">

</iframe>

Source: UCSB Library Data Literacy Series
([perma.cc/3ZHR-6JAG](https://perma.cc/3ZHR-6JAG){target='_blank'})

### Applying Access Controls

Implementing access controls involves a trade-off: while stricter
controls reduce misuse risk, they can hinder beneficial access. Though
powerful, they should not unnecessarily complicate access to low-risk
data. As the principal steward of your data, you ultimately decide on
access controls. However, it’s advisable to involve repository staff in
this process, as they can highlight potential challenges, ensuring that
your data remains accessible and ethically shared in the long run.

-   Sharing de-identified transcripts openly while placing recordings
    under more stringent access controls.

-   Do keep a list of de-identification rules for yourself and your team
    should you collaborate. This list serves as necessary documentation
    when you share your data. See, for example, the protocol Thad
    Dunning and Edward Camp used to de-identify data deposited with the
    Qualitative Data Repository. This document is separate from the key
    that links de-identified entries to the individuals or entities
    interviewed, which should not be included when sharing your data.

-   Do check the document properties of files, which may contain
    identifiers such as original file names identifying interview
    respondents.

-   Finally, do try to strike a balance between keeping your
    participants’ information confidential and unnecessarily reducing
    the analytic value of the data by removing too much information. If
    you are having difficulties striking that balance, you could ask
    another subject-matter expert for assistance; some repository
    personnel or data librarians can also provide abstract rules that
    you can follow.

### What data?

ICPSR's Guide for Sharing Qualitative Data outlines examples of
qualitative data sources that may be archived for secondary analysis:

• Interview methods, including those captured through notes, audio, and
video

-   In-depth and/or unstructured interviews

-   Semi-structured interviews

-   Focus group interviews

• Diary studies that are unstructured or use semi-structured writing
prompts

• Observational studies that generate field notes and other text and
information

-   Naturalistic observation of real-world environments (e.g.,
    classrooms, workplaces, healthcare facilities, courtrooms, public
    spaces)

-   Participant observation, where the researcher becomes an active part
    of the setting to collect information (e.g., online gaming,
    community policing, nightclub culture)

-   Structured observation is where the research has predefined
    objectives and a systemic approach to collecting information. This
    would include case studies.

• Text from available sources

-   Meeting minutes

-   Official records Medical records

-   News sources and social media

-   Excerpts of copyrighted materials (e.g., literature, film, music)

• Survey methods or questionnaires with substantial open-ended comments

### Open formats

Why should we prioritize open file formats in our research? Imagine
sharing your groundbreaking findings and ensuring that anyone, anywhere,
can access and build upon your work without running into compatibility
issues. Open formats, offer exactly that—freedom from proprietary
software constraints. By choosing open formats, you enhance
collaboration and transparency and make your research more sustainable
for others and your future self.

There is a diversity of open formats available across different types of
media that can be of great use to qualitative data researchers,
including audio, video, image, and text. Refer to the handout below for
some examples:

<iframe width="50%" height="800" src="https://rcd.ucsb.edu/sites/default/files/2023-03/dls-n07-2021-openformats-navy.pdf">

</iframe>

Source: UCSB Library Data Literacy Series
([perma.cc/W4FL-JDFT](https://perma.cc/W4FL-JDFT){target='_blank'})

### Where Should You Share Your Project Data?

The decision of where to archive data is crucial for ensuring its
accessibility, integrity, and long-term preservation. Selecting a
stable, certified repository not only safeguards the data against loss
or corruption but also enhances its credibility and usability within the
research community. Unlike sharing via email, personal communication, or
unsecured websites—methods that can lead to data loss, miscommunication,
and lack of traceability—certified repositories provide a structured and
secure environment for data management.

Such repositories adhere to rigorous standards for data storage and
access, ensuring that shared data remains discoverable, citable, and
protected over time. By thoughtfully choosing the right repository,
researchers can maximize the impact of their work, facilitate
reproducibility, and contribute to the advancement of knowledge across
various fields.

Beyond support to access controls when required, choosing a repository
to archive QHS data, should take into account several factors laid out
in the handout below:

<iframe width="50%" height="800" src="https://rcd.ucsb.edu/sites/default/files/2023-03/dls-n05-2021-dr-navy.pdf">

</iframe>

Source: UCSB Library Data Literacy Series
([perma.cc/WLF7-WTUC](https://perma.cc/WLF7-WTUC){target='_blank'}).

### Preparing Your Data for Submission

There are a few required and recommended files that are important to be
added to your project package submission.

Required:

-   Processed de-identified data (e.g., transcripts);

-   Coded Data (supporting excerpts);

-   README File: an overview of your project, including data sources,
    their relationships and a brief description of the methods. Here is
    a [customizable README
    template](https://zenodo.org/records/10828379){target='_blank'};

-   Data Collection Instruments: A sample of instruments used for data
    collection, such as surveys or interview guides;

-   Codebook: the coding framework used, including definitions of codes
    and categories;


Recommended:

-   Informed consent statement(s), if applicable;

-   IRB protocol, if applicable;

-   Study protocol or procedures manual, if applicable.

<iframe width="50%" height="800" src="https://rcd.ucsb.edu/sites/default/files/2024-07/DLS-202407-QualDataSharing.pdf">

</iframe>

Source: UCSB Library Data Literacy Series
([perma.cc/E7BA-BBYE](https://perma.cc/E7BA-BBYE){target='_blank'}).

### Licensing Your Data

Research data itself is generally not copyrightable because it consists
of facts, figures, and raw information that cannot be considered
original creative expression. Copyright protects the unique expression
of ideas, such as written texts, artwork, and music, rather than the
underlying data or factual content.

Most data repositories adhere to open licenses such as CC0 (Creative
Commons Zero) or CC BY (Creative Commons Attribution) to encourage broad
accessibility and reuse of data. These licenses promote the free sharing
of knowledge, allowing researchers and practitioners to utilize, modify,
and redistribute data without significant restrictions, ultimately
fostering collaboration and innovation within the scientific community.

However, researchers may choose to assign different licenses to other
creative deliverables and supplementary materials associated with their
projects, such as reports, presentations, or multimedia content. For
example, Sarah might opt for a CC BY-NC (Attribution-NonCommercial)
license for a infographic she created to represent the ethical
approaches in social media influencing market, to restrict its use for
commercial purpose. This flexibility allows Sarah and the research
community at large to balance openness with the need to protect specific
aspects of their intellectual property while still contributing to the
collective body of knowledge.

The handout below provides more insights about licenses, including the
Creative Commons family:

<iframe width="50%" height="800" src="https://rcd.ucsb.edu/sites/default/files/2023-03/dls-n10-2021-licensing-navy_0.pdf">

</iframe>

Source: UCSB Library Data Literacy Series
([perma.cc/ET6F-N84X](https://perma.cc/ET6F-N84X){target='_blank'}).

------------------------------------------------------------------------

**Recommended/Cited Sources:**

Campbell R, Javorka M, Engleton J, Fishwick K, Gregory K,
Goodman-Williams R. Open-Science Guidance for Qualitative Research: An
Empirically Validated Approach for De-Identifying Sensitive Narrative
Data. *Advances in Methods and Practices in Psychological Science*.
2023;6(4).
doi:[10.1177/25152459231205832](https://doi.org/10.1177/25152459231205832){target='_blank'}

Myers CA, Long SE, Polasek FO. Protecting participant privacy while
maintaining content and context: Challenges in qualitative data
De-identification and sharing. ProcAssoc Inf Sci Technol. 2020;57:e415.
<https://doi.org/10.1002/pra2.415>{target='_blank'}

DuBois, J. M., Strait, M., & Walsh, H. (2018). Is it time to share
qualitative research data?*Qualitative Psychology, 5*(3), 380–393.
<https://doi.org/10.1037/qup0000076>{target='_blank'}