Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load dependencies from SPDX SBOMs #1145 #1345

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,3 @@ max-complexity = 10
[tool.ruff.lint.per-file-ignores]
# Allow the usage of assert in the test_spdx file.
"**/test_spdx.py*" = ["S101"]
"scanpipe/pipes/spdx.py" = ["UP006", "UP035"]
7 changes: 7 additions & 0 deletions scancodeio/static/tree-views/expand-collapse.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 75 additions & 0 deletions scancodeio/static/tree-views/tree.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
.tree{
--spacing : 1.5rem;
--radius : 10px;
}

.tree li{
display : block;
position : relative;
padding-left : calc(2 * var(--spacing) - var(--radius) - 2px);
}

.tree ul{
margin-left : calc(var(--radius) - var(--spacing));
padding-left : 0;
}

.tree ul li{
border-left : 2px solid #ddd;
}

.tree ul li:last-child{
border-color : transparent;
}

.tree ul li::before{
content : '';
display : block;
position : absolute;
top : calc(var(--spacing) / -2);
left : -2px;
width : calc(var(--spacing) + 2px);
height : calc(var(--spacing) + 1px);
border : solid #ddd;
border-width : 0 0 2px 2px;
}

.tree summary{
display : block;
cursor : pointer;
}

.tree summary::marker,
.tree summary::-webkit-details-marker{
display : none;
}

.tree summary:focus{
outline : none;
}

.tree summary:focus-visible{
outline : 1px dotted #000;
}

.tree li::after,
.tree summary::before{
content : '';
display : block;
position : absolute;
top : calc(var(--spacing) / 2 - var(--radius));
left : calc(var(--spacing) - var(--radius) - 1px);
width : calc(2 * var(--radius));
height : calc(2 * var(--radius));
border-radius : 50%;
background : #ddd;
}

.tree summary::before{
z-index : 1;
background : #696 url('expand-collapse.svg') 0 0;
}

.tree details[open] > summary::before{
background-position : calc(-2 * var(--radius)) 0;
}
10 changes: 10 additions & 0 deletions scancodeio/static/tree-views/tree.css.ABOUT
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
about_resource: tree.css
name: css-tree-views
homepage_url: https://iamkate.com/code/tree-views/
description: A tree view (collapsible list) can be created using only html and css, without
the need for JavaScript. Accessibility software will see the tree view as lists nested inside
disclosure widgets, and the standard keyboard interaction is supported automatically.
license_expression: cc0-1.0
licenses:
- key: cc0-1.0
name: cc0-1.0
29 changes: 29 additions & 0 deletions scancodeio/static/tree-views/tree.css.NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Free content on iamkate.com
The web was still young when I first went online in 1998. It felt like a utopian dream of free culture and free knowledge. Anyone could contribute, and within weeks I had learnt html and created my first site, hosted in the 10mb of webspace my isp included as standard.

I’ve watched as the dream has become a nightmare of surveillance and monetisation. Companies such as Google and Facebook offer their services for free to the public because their real products are their advertising networks powered by the personal data of their visitors.

The only concern of these companies and their shareholders is to maximise their income from advertising, regardless of the costs to society. They use dubious schemes to avoid paying tax. They encourage addiction and risk the mental health of their visitors. They threaten democratic institutions.

I have little influence over the wider web, but I can control my small part of it, creating a haven that remains true to the original dream. This page describes my approach to copyright, my promise to protect the privacy of my visitors, and my commitment to transparency.

Copyright
Copyright limits creativity and holds back progress by restricting our rights to build upon the works of others. Copyleft licences attempt to use copyright against itself, but “the master’s tools will never dismantle the master’s house”, as Audre Lorde remarked in a different context.

All content on my site is released under the terms of the Creative Commons CC0 1.0 Universal Legal Code. This means I have waived all copyright and related rights to the extent possible under law, with the intention of dedicating the content to the public domain. You can use and adapt it without attribution.

Privacy
Every site is hosted on a server, which is usually operated by a third party due to the expertise needed to manage servers securely. Most sites are accessed indirectly through the servers of a content delivery network, which protects the original server from attacks that could disable the site.

My site is hosted on Cloudflare Pages. Cloudflare is both the host and the content delivery network, avoiding the need to trust two separate third parties. Cloudflare have a strong commitment to privacy and data protection, and frequently write about developing systems to protect visitor privacy.

Almost every site today includes code that tracks visitors for statistical and advertising purposes. Often the site owner includes code with the deliberate aim of tracking their visitors, but sometimes they just want to include a feature provided by a third party, and that provider includes their own tracking code.

My site doesn’t include any tracking code, and doesn’t load any code from third parties. It doesn’t have a cookie banner because it doesn’t use cookies. Instead of an invasive analytics system, Cloudflare Web Analytics gives me the most important statistics without tracking individual visitors.

Transparency
You probably don’t know me, and shouldn’t have to trust me. Instead, you should be able to check security and privacy claims for yourself. Unfortunately most sites today use a process called code minification, which makes them faster but also makes it harder for other people to understand their code.

The Mozilla Observatory report for my site confirms the presence of various security and privacy features, resulting in a perfect A+ rating. One of these features, the content security policy, prevents browsers from loading code and other resources from third parties.

My site doesn’t need to use code minification in order to load quickly due to its simple design, efficient implementation, and absence of resources loaded from third parties. As a result, other software developers can easily understand how the layout, styling, and interactive features are created.
98 changes: 61 additions & 37 deletions scanpipe/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -3591,6 +3591,18 @@ def as_cyclonedx(self):
class DiscoveredDependencyQuerySet(
PackageURLQuerySetMixin, VulnerabilityQuerySetMixin, ProjectRelatedQuerySet
):
def project_dependencies(self):
return self.filter(for_package__isnull=True)

def package_dependencies(self):
return self.filter(for_package__isnull=False)

def resolved(self):
return self.filter(resolved_to_package__isnull=False)

def unresolved(self):
return self.filter(resolved_to_package__isnull=True)

def prefetch_for_serializer(self):
"""
Optimized prefetching for a QuerySet to be consumed by the
Expand Down Expand Up @@ -3624,6 +3636,15 @@ class DiscoveredDependency(
system and application packages discovered in the code under analysis.
Dependencies are usually collected from parsed package data such as a package
manifest or lockfile.

This class manages dependencies with the following considerations:

1. A dependency can be associated with a Package via the "for_package" field.
In this case, it is termed a "Package's dependency". If there is no such
association, the dependency is considered a "Project's dependency".

2. A dependency can also be linked to a Package through the "resolved_to_package"
field. When this link exists, the dependency is considered "resolved".
"""

# Overrides the `project` field to set the proper `related_name`.
Expand Down Expand Up @@ -3774,6 +3795,18 @@ def datafile_path(self):
if self.datafile_resource:
return self.datafile_resource.path

@property
def is_project_dependency(self):
return not bool(self.for_package_id)

@property
def is_for_package(self):
return bool(self.for_package_id)

@property
def is_resolved_to_package(self):
return bool(self.resolved_to_package_id)

@classmethod
def create_from_data(
cls,
Expand All @@ -3789,6 +3822,12 @@ def create_from_data(
Create and returns a DiscoveredDependency for a `project` from the
`dependency_data`.

The `for_package` and `resolved_to_package` FK can be provided as args or
in the dependency_data providing the `for_package_uid` and
`resolve_to_package_uid`.
Note that a dependency without a `for_package` FK is a project dependency and
a dependency without a `resolve_to_package` is unresolved.

If `strip_datafile_path_root` is True, then `create_from_data()` will
strip the root path segment from the `datafile_path` of
`dependency_data` before looking up the corresponding CodebaseResource
Expand All @@ -3797,51 +3836,36 @@ def create_from_data(
not stripped for `datafile_path`.
"""
dependency_data = dependency_data.copy()
required_fields = ["purl", "dependency_uid"]
missing_values = [
field_name
for field_name in required_fields
if not dependency_data.get(field_name)
]
project_packages_qs = project.discoveredpackages

if missing_values:
message = (
f"No values for the following required fields: "
f"{', '.join(missing_values)}"
)
if not dependency_data.get("dependency_uid"):
dependency_data["dependency_uid"] = str(uuid.uuid4())

project.add_warning(description=message, model=cls, details=dependency_data)
return
for_package_uid = dependency_data.get("for_package_uid")
if not for_package and for_package_uid:
for_package = project_packages_qs.get_or_none(package_uid=for_package_uid)

if not for_package:
for_package_uid = dependency_data.get("for_package_uid")
if for_package_uid:
for_package = project.discoveredpackages.get(
package_uid=for_package_uid
)

if not resolved_to_package:
resolved_to_uid = dependency_data.get("resolved_to_uid")
if resolved_to_uid:
resolved_to_package = project.discoveredpackages.get(
package_uid=resolved_to_uid
)
resolve_to_package_uid = dependency_data.get("resolve_to_package_uid")
if not resolved_to_package and resolve_to_package_uid:
resolved_to_package = project_packages_qs.get_or_none(
package_uid=resolve_to_package_uid
)

if not datafile_resource:
datafile_path = dependency_data.get("datafile_path")
if datafile_path:
if strip_datafile_path_root:
segments = datafile_path.split("/")
datafile_path = "/".join(segments[1:])
datafile_resource = project.codebaseresources.get(path=datafile_path)
datafile_path = dependency_data.get("datafile_path")
if not datafile_resource and datafile_path:
if strip_datafile_path_root:
segments = datafile_path.split("/")
datafile_path = "/".join(segments[1:])
datafile_resource = project.codebaseresources.get(path=datafile_path)

if datasource_id:
dependency_data["datasource_id"] = datasource_id

# Set purl fields from `purl`
# Set package_url fields from the ``purl`` string.
purl = dependency_data.get("purl")
purl_mapping = PackageURL.from_string(purl).to_dict()
dependency_data.update(**purl_mapping)
if purl:
purl_data_dict = PackageURL.from_string(purl).to_dict()
dependency_data.update(**purl_data_dict)

cleaned_data = {
field_name: value
Expand Down Expand Up @@ -3875,7 +3899,7 @@ def populate_dependency_uuid(cls, dependency_data):
def spdx_id(self):
return f"SPDXRef-scancodeio-{self._meta.model_name}-{self.dependency_uid}"

def as_spdx(self):
def as_spdx_package(self):
"""Return this Dependency as an SPDX Package entry."""
from scanpipe.pipes import spdx

Expand Down
16 changes: 12 additions & 4 deletions scanpipe/pipelines/load_sbom.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
# Visit https://github.com/aboutcode-org/scancode.io for support and download.

from scanpipe.models import DiscoveredDependency
from scanpipe.pipelines.scan_codebase import ScanCodebase
from scanpipe.pipes import resolve

Expand All @@ -44,7 +45,7 @@ def steps(cls):
cls.flag_empty_files,
cls.flag_ignored_resources,
cls.get_sbom_inputs,
cls.get_packages_from_sboms,
cls.get_data_from_sboms,
cls.create_packages_from_sboms,
cls.create_dependencies_from_sboms,
)
Expand All @@ -53,13 +54,13 @@ def get_sbom_inputs(self):
"""Locate all the SBOMs among the codebase resources."""
self.manifest_resources = resolve.get_manifest_resources(self.project)

def get_packages_from_sboms(self):
def get_data_from_sboms(self):
"""Get packages data from SBOMs."""
self.packages = resolve.get_packages(
self.packages, self.dependencies = resolve.get_data_from_manifests(
project=self.project,
package_registry=resolve.sbom_registry,
manifest_resources=self.manifest_resources,
model="get_packages_from_sboms",
model="get_data_from_sboms",
)

def create_packages_from_sboms(self):
Expand All @@ -71,4 +72,11 @@ def create_packages_from_sboms(self):

def create_dependencies_from_sboms(self):
"""Create the dependency relationship declared in the SBOMs."""
# TODO: Migrate the CycloneDX behavior too, see get_dependencies_from_manifest
resolve.create_dependencies_from_packages_extra_data(project=self.project)

for dependency_data in self.dependencies:
DiscoveredDependency.create_from_data(
project=self.project,
dependency_data=dependency_data,
)
1 change: 0 additions & 1 deletion scanpipe/pipes/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,6 @@ def get_dependencies(project, dependency_data):
Given a `dependency_data` mapping, get a list of DiscoveredDependency objects
for that `project` with similar dependency data.
"""
dependency = None
dependency_uid = dependency_data.get("dependency_uid")
extracted_requirement = dependency_data.get("extracted_requirement") or ""

Expand Down
11 changes: 6 additions & 5 deletions scanpipe/pipes/cyclonedx.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,12 +155,12 @@ def cyclonedx_component_to_package_data(cdx_component, dependencies=None):
dependencies = dependencies or {}
extra_data = {}

# Store the original bom_ref and dependencies for future processing.
bom_ref = str(cdx_component.bom_ref)
if bom_ref:
extra_data["bom_ref"] = bom_ref
if depends_on := dependencies.get(bom_ref):
extra_data["depends_on"] = depends_on
if depends_on := dependencies.get(bom_ref):
extra_data["depends_on"] = depends_on

# Store the original "bom_ref" as package_uid for dependencies resolution.
package_uid = bom_ref

package_url_dict = {}
if cdx_component.purl:
Expand All @@ -176,6 +176,7 @@ def cyclonedx_component_to_package_data(cdx_component, dependencies=None):
extra_data["nestedComponents"] = sorted(nested_purls)

package_data = {
"package_uid": package_uid,
"name": cdx_component.name,
"extracted_license_statement": declared_license,
"copyright": cdx_component.copyright,
Expand Down
Loading
Loading