Skip to content

Commit

Permalink
Merge pull request #9 from helios-pipeline/feat/replace-bullets-with-…
Browse files Browse the repository at this point in the history
…icons

feat: replaced lists with icons
  • Loading branch information
gjcochran authored Aug 20, 2024
2 parents 00f5b32 + 215170e commit b5aea37
Show file tree
Hide file tree
Showing 20 changed files with 239 additions and 796 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
node_modules/
docs/.vitepress/cache/
docs/.vitepress/cache/
.DS_Store
33 changes: 33 additions & 0 deletions docs/.vitepress/theme/components/CustomIcon.vue
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<template>
<img v-if="iconSrc" :src="iconSrc" :alt="name" class="custom-icon" />
<span v-else class="icon-placeholder">?</span>
</template>

<script setup>
import { computed } from 'vue'
import { customIcons } from './customIcons.js'
const props = defineProps(['name'])
const iconSrc = computed(() => customIcons[props.name] || null)
</script>

<style scoped>
.custom-icon {
display: inline-block;
width: 1.6em;
height: 1.6em;
vertical-align: middle;
}
.icon-placeholder {
display: inline-block;
width: 1.6em;
height: 1.6em;
vertical-align: middle;
text-align: center;
line-height: 1.6em;
background-color: #ccc;
color: #fff;
font-weight: bold;
}
</style>
31 changes: 31 additions & 0 deletions docs/.vitepress/theme/components/Icon.vue
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<template>
<component :is="iconComponent" v-if="isHeroicon" class="icon" />
<CustomIcon v-else :name="name" class="icon" />
</template>

<script setup>
import { computed } from 'vue'
import * as OutlineIcons from '@heroicons/vue/24/outline'
import CustomIcon from './CustomIcon.vue'
import { customIcons } from './customIcons.js'
const props = defineProps(['name'])
const isHeroicon = computed(() => props.name in OutlineIcons)
const isCustomIcon = computed(() => props.name in customIcons)
const iconComponent = computed(() => isHeroicon.value ? OutlineIcons[props.name] : null)
</script>

<style scoped>
/* .icon { */
/* display: inline-block; */
/* width: 1.6em; */
/* height: 1.6em; */
/* vertical-align: middle; */
/* } */
.icon {
display: block;
width: 1.6em;
height: 1.6em;
}
</style>
6 changes: 6 additions & 0 deletions docs/.vitepress/theme/components/customIcons.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
export const customIcons = {
"terraform-icon": "./custom_icons/terraform.png",
"aws-icon": "./custom_icons/aws.png",
"kafka-icon": "./custom_icons/kafka.png",
"gcp-icon": "./custom_icons/gcp.png",
};
2 changes: 2 additions & 0 deletions docs/.vitepress/theme/components/index.js
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
export { default as BenchmarkTable } from "./BenchmarkTable.vue";
export { default as TippyWrapper } from "./TippyWrapper.vue";
export { default as Icon } from "./Icon.vue";
export { default as HomePage } from "./HomePage.vue";
2 changes: 1 addition & 1 deletion docs/.vitepress/theme/index.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import DefaultTheme from "vitepress/theme";
import VueTippy from "vue-tippy";
import "tippy.js/dist/tippy.css";
import "./custom.css";
import "./styles/custom.css";
import * as components from "./components";

export default {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,28 @@

.outline-link {
white-space: wrap !important;
}
}

.icon-list {
padding-left: 0.5em;
}

.icon-list p {
display: flex;
align-items: flex-start;
margin-bottom: 0.5em;
}

.icon-list .icon {
flex-shrink: 0;
margin-right: 0.5em;
width: 1.6em;
height: 1.6em;
color: #e37626;
}

.icon-list p span {
display: inline-block;
line-height: 1.2;
padding-top: 0.2em;
}
16 changes: 11 additions & 5 deletions docs/automating-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,24 @@ This interface simplifies what would otherwise be complex AWS operations, allowi

How it works:

- When deploying Helios with the CLI, a user will be prompted for an AWS profile name and an optional ChatGPT API key.
- Once credentials are provided, the CLI uses the stored AWS credentials in your local AWS environment configuration files to deploy all Helios infrastructure within your AWS account.
- Under the hood, Helios leverages the AWS Cloud Development Kit (AWS CDK) which we will go into more detail below.
<div class="icon-list">
<p><Icon name="CloudIcon" /><span>When deploying Helios with the CLI, a user will be prompted for an AWS profile name and an optional ChatGPT API key.</span></p>
<p><Icon name="KeyIcon" /><span> Once credentials are provided, the CLI uses the stored AWS credentials in your local AWS environment configuration files to deploy all Helios infrastructure within your AWS account.</span></p>
<p><Icon name="CubeTransparentIcon" /><span> Under the hood, Helios leverages the AWS Cloud Development Kit (AWS CDK) which we will go into more detail below.</span></p>
</div>

![CLI](public/case_study/cli_dropshadow.png)

## AWS CDK

<!-- ![testTerraform](./public/terraform.png) -->

To automate deployment for our users, we evaluated the AWS CDK and Terraform.

- Terraform: an open source infrastructure as code tool that enables declarative configuration of cloud resources across multiple providers, using its own domain-specific language.
- AWS CDK: an infrastructure as code framework for defining cloud infrastructure, which then compiles into CloudFormation YAML templates. This process combines the flexibility of a programming language with the reliability of declarative deployments, allowing developers to use object-oriented techniques to model their infrastructure.
<div class="icon-list">
<p><Icon name="terraform-icon" /><span><strong>Terraform</strong>: an open source infrastructure as code tool that enables declarative configuration of cloud resources across multiple providers, using its own domain-specific language.</span></p>
<p><Icon name="aws-icon" /><span><strong>AWS CDK</strong>: an infrastructure as code framework for defining cloud infrastructure, which then compiles into CloudFormation YAML templates. This process combines the flexibility of a programming language with the reliability of declarative deployments, allowing developers to use object-oriented techniques to model their infrastructure.</span></p>
</div>

Comparing the AWS CDK and Terraform, while we found Terraform easier to use and quicker for initial deployments, however, we ultimately preferred the CDK. In our opinion, it offered us slightly better debugging capabilities and greater control in defining our infrastructure, particularly for complex, interconnected resources.

Expand Down
30 changes: 18 additions & 12 deletions docs/building-helios.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@ When evaluating storage solutions for real-time event streaming and querying, we

### Our criteria

1. High write throughput: The database must be capable of handling continuous writes of large volumes of streaming data, allowing Helios to ingest data efficiently without bottlenecks.
2. Low-latency query performance: Aggregating large datasets for analytical queries inherently challenges performance, and real-time analytics demands rapid insights for timely decision-making. Specifically, the database should be able to execute aggregation queries spanning multiple columns across tens of millions of rows in under a few seconds.
3. Extensive SQL support: Given SQL's popularity and reliability as a querying language, we prioritized databases that support a wide range of SQL features. This ensures users can leverage complex joins, filters, and aggregations, enhancing the flexibility and depth of their data analysis.
<div class="icon-list">
<p><Icon name="ArrowTrendingUpIcon" /><span>High write throughput: The database must be capable of handling continuous writes of large volumes of streaming data, allowing Helios to ingest data efficiently without bottlenecks.</span></p>
<p><Icon name="BoltIcon" /><span>Low-latency query performance: Aggregating large datasets for analytical queries inherently challenges performance, and real-time analytics demands rapid insights for timely decision-making. Specifically, the database should be able to execute aggregation queries spanning multiple columns across tens of millions of rows in under a few seconds.</span></p>
<p><Icon name="CodeBracketIcon" /><span>Extensive SQL support: Given SQL's popularity and reliability as a querying language, we prioritized databases that support a wide range of SQL features. This ensures users can leverage complex joins, filters, and aggregations, enhancing the flexibility and depth of their data analysis.</span></p>
</div>

### Document-based Storage

Expand Down Expand Up @@ -48,11 +50,13 @@ After evaluating these database types, it was clear that columnar-based storage

After deciding on a columnar database, we evaluated a number of database options and ultimately selected Clickhouse as our preferred choice because it met all of our original criteria plus had a few extra standout benefits:

- High write throughput
- Low-latency query performance
- <TippyWrapper content="Provides the most support for ANSI SQL compared to the other columnar databases we evaluated such as Apache Druid and Apache Pinot, allowing users to leverage familiar query syntax and features">SQL Support</TippyWrapper>
- Comprehensive Documentation
- Open Source
<div class="icon-list">
<p><Icon name="ArrowTrendingUpIcon" /><span>High write throughput</span></p>
<p><Icon name="BoltIcon" /><span>Low-latency query performance</span></p>
<p><Icon name="CodeBracketIcon" /><TippyWrapper content="Provides the most support for ANSI SQL compared to the other columnar databases we evaluated such as Apache Druid and Apache Pinot, allowing users to leverage familiar query syntax and features">SQL Support</TippyWrapper></p>
<p><Icon name="DocumentTextIcon" /><span>Comprehensive Documentation</span></p>
<p><Icon name="LockOpenIcon" /><span>Open Source</span></p>
</div>

Of the criteria listed above, ClickHouse's impressive read and write latency particularly impressed us. For more insights into ClickHouse's performance, see our [Load Testing](./load-testing.md) results.

Expand All @@ -70,10 +74,12 @@ Contrary to the typical horizontal scaling strategy for other database types lik
Key factors in our decision-making process included:

1. Performance optimization: ClickHouse can efficiently handle massive datasets on a single server
2. Simplicity: Reduced complexity in deployment and management compared to a clustered setup.
3. Cost-effectiveness: Maximizing resource utilization before scaling horizontally.
4. Scalability: Ensuring our architecture can still accommodate future growth when necessary.
<div class="icon-list">
<p><Icon name="CpuChipIcon" /><span>Performance optimization: ClickHouse can efficiently handle massive datasets on a single server</span></p>
<p><Icon name="CircleStackIcon" /><span>Simplicity: Reduced complexity in deployment and management compared to a clustered setup.</span></p>
<p><Icon name="CurrencyDollarIcon" /><span>Cost-effectiveness: Maximizing resource utilization before scaling horizontally.</span></p>
<p><Icon name="ArrowsPointingOutIcon" /><span>Scalability: Ensuring our architecture can still accommodate future growth when necessary.</span></p>
</div>

Based on these considerations, we opted to host ClickHouse on an Amazon EC2 virtual server with Elastic Block Storage (EBS). This approach allows us to leverage ClickHouse's inherent strengths while maintaining the flexibility for users to scale as needed. Later in the Scaling Helios section, we will go deeper into EBS and other vertical scaling considerations as it pertains to Helios.

Expand Down
31 changes: 22 additions & 9 deletions docs/future-work.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ The introduction of API endpoints in Helios would significantly enhance user cap

### Use Cases

1. Real-time Dashboards: Endpoints can be integrated with visualization tools like Grafana.
2. Data Integration: Other applications can easily incorporate Helios data into their workflows without having to use our console.
<div class="icon-list">
<p><Icon name="PresentationChartLineIcon" /><span><strong>Real-time Dashboards</strong>: Endpoints can be integrated with visualization tools like Grafana.</span></p>
<p><Icon name="PuzzlePieceIcon" /><span><strong>Data Integration</strong>: Other applications can easily incorporate Helios data into their workflows without having to use our console.</span></p>
</div>

## Supporting Materialized Views

Expand All @@ -27,13 +29,24 @@ Our initial release supports data ingestion solely through Amazon Kinesis. While

Future versions of Helios could expand support for additional streaming platforms. To address this, we've outlined an approach to expand our data ingestion capabilities:

- Apache Kafka Integration: As the industry leader, Kafka integration is our top priority. We'll approach this in two stages:
- Amazon MSK Integration: We'll first support Amazon Managed Service for Apache Kafka, leveraging our existing Lambda function logic.
- Direct Kafka Support: Following this, we'll re-architect our Lambda Connector function for direct Kafka integration.
- Additional Platforms: After Kafka, we would extend support to other popular streaming platforms, including:
- Google Pub/Sub
- Redpanda
- Confluent _(a popular Kafka managed service)_
1. Apache Kafka Integration: As the industry leader, Kafka integration is our top priority. We'll approach this in two stages:

- Amazon MSK Integration: We'll first support Amazon Managed Service for Apache Kafka, leveraging our existing Lambda function logic
- Direct Kafka Support: Following this, we'll re-architect our Lambda Connector function for direct Kafka integration.
<!-- <div class="icon-list"> -->
<!-- <p><Icon name="aws-icon" /><span>Amazon MSK Integration: We'll first support Amazon Managed Service for Apache Kafka, leveraging our existing Lambda function logic</span></p> -->
<!-- <p><Icon name="kafka-icon" /><span>Direct Kafka Support: Following this, we'll re-architect our Lambda Connector function for direct Kafka integration.</span></p> -->
<!-- </div> -->

2. Additional Platforms: After Kafka, we would extend support to other popular streaming platforms, including:
- Google Pub/Sub
- Redpanda
- Confluent _(a popular Kafka managed service)_
<!-- <div class="icon-list"> -->
<!-- <p><Icon name="gcp-icon" /><span>Google Pub/Sub</span></p> -->
<!-- <p><Icon name="" /><span>Redpanda</span></p> -->
<!-- <p><Icon name="" /><span>Confluent _(a popular Kafka managed service)_</span></p> -->
<!-- </div> -->

This expansion will require a number of changes to our current implementation. Our existing serverless function, which ingests data from Kinesis and inserts it into ClickHouse DB, relies on AWS Lambda Triggers.

Expand Down
25 changes: 16 additions & 9 deletions docs/helios-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@

To meet the requirements of Amazon Kinesis users looking to explore and analyze their event streams, Helios is built around three key components:

1. Storage \- A database optimized for querying streaming data, specifically an Online Analytical Processing (OLAP) database capable of handling high-volume, real-time data ingestion, and delivering fast query performance for analytical workloads.
2. Connection \- An ingestion mechanism to efficiently transfer events from Kinesis streams into our chosen database.
3. Interface \- A user-friendly graphical interface allowing users to conduct analyses and visualize results.
<div class="icon-list">
<p><Icon name="CircleStackIcon" /><span><strong>Storage</strong> - A database optimized for querying streaming data, specifically an Online Analytical Processing (OLAP) database capable of handling high-volume, real-time data ingestion, and delivering fast query performance for analytical workloads.</span></p>
<p><Icon name="LinkIcon" /><span><strong>Connection</strong> - An ingestion mechanism to efficiently transfer events from Kinesis streams into our chosen database.</span></p>
<p><Icon name="WindowIcon" /><span><strong>Interface</strong> - A user-friendly graphical interface allowing users to conduct analyses and visualize results.</span></p>
</div>

![Core Arch](public/case_study/core_full_color.png)

Expand Down Expand Up @@ -42,10 +44,13 @@ Efficiently transferring events from Kinesis streams to our ClickHouse database

To address this, we developed a custom AWS Lambda function as our stream processor. This approach allows us to:

- Decode and parse Kinesis event data
- Implement custom error handling
- Dynamically map data to appropriate ClickHouse tables
- Perform efficient inserts
<div class="icon-list">
<p><Icon name="DocumentMagnifyingGlassIcon" /><span>Decode and parse Kinesis event data</span></p>

<p><Icon name="ExclamationTriangleIcon" /><span> Implement custom error handling</span></p>
<p><Icon name="TableCellsIcon" /><span> Dynamically map data to appropriate ClickHouse tables</span></p>
<p><Icon name="ArrowDownOnSquareIcon" /><span> Perform efficient inserts</span></p>
</div>

By leveraging Lambda, we created a flexible and scalable solution tailored to our specific data processing needs. Let's explore how this custom processor works in detail.

Expand Down Expand Up @@ -75,8 +80,10 @@ While the storage and connection components form the backbone of Helios, the ana

The Helios web application, hosted on an Amazon EC2 instance, serves as the primary interface for users. Implemented with a Flask backend and a React frontend, its core features include:

1. An interactive SQL console for querying data from event streams, enabling real-time data analysis
2. An interface for connecting a data source, such as a Kinesis stream, to the Helios architecture
<div class="icon-list">
<p><Icon name="CommandLineIcon" /><span>An interactive SQL console for querying data from event streams, enabling real-time data analysis</span></p>
<p><Icon name="LinkIcon" /><span>An interface for connecting a data source, such as a Kinesis stream, to the Helios architecture</span></p>
</div>

Now that you have a good understanding of how Helios works, in the next section we will cover why we designed it in this way as well as the trade-offs made throughout the building of Helios. Here is our architecture so far:
![Core Arch](public/case_study/core_full_color.png)
Loading

0 comments on commit b5aea37

Please sign in to comment.