Skip to content

3. Case study

AAnzel edited this page Jan 17, 2022 · 4 revisions

3.0. Introduction

To demonstrate the usability of MOVIS, we will present a case study based on one of the available examples. The case study is also built-in into MOVIS as one of the options on the navigation sidebar. We will focus on the built-in Example 1 that contains metagenomics, metaproteomics, metatranscriptomics, metabolomics, and physico-chemical data from the biological wastewater treatment plant (BWWTP). The data was collected in situ, at weekly intervals, and over 14 months. The end goal of this case study is to reveal if there are any niche types, and if there are, how do they respond to the substrate changes. The case study employs some functional omics aspects, as presented in the original manuscript.

Below you can see an overview of the case study, as presented in MOVIS.

Figure 0a Figure 0b Figure 0c Figure 0d Figure 0e

3.1. Main findings

Even though BWWTP operation is a controlled process, factors such as aeration cycles, seasonal changes in temperature, and composition of inflow wastewater fluctuate. These factors may have a meaningful impact on population dynamics and linked process efficiency. Therefore, the first step of our case study is inspecting relevant physico-chemical properties of the wastewater and determining major shifts, if there are any.

3.1.1. Physico-chemical data

For those purposes, we selected the Physico-Chemical data and, more specifically, the Processed data set 1. The selected data set contains 34 different physico-chemical properties with a 2-hour sampling rate. The properties of interest for this case study are Volume_aeration m3/h, T C (Temperature in Celsius), and Inflow_conductivity μS/cm. Visualizing selected properties using Feature through time visualization resulted in Figures 1a, 1b, and 1c.

Figure 1a Figure 1a

Figure 1b Figure 1b

Figure 1c Figure 1c

Figure 1. Relevant physico-chemical properties of the wastewater sludge. A seasonal pattern is present in the first figure, while we have more irregular readings in Figure 2 and Figure 3.

Even with a lot of noise present, Figure 1a shows a sinusoid with three distinct peaks (around June 2011, November 2011, and April 2012) and two distinct dips (around August 2011 and January 2012). Figures 1b, and 1c have less noise, with the second figure showing a distinct increase in temperature near the end of the sampling period. In order to further examine irregular behavior of temperature and conductivity, we visualized them using Time Heatmap in addition to the existing visualizations. The new visualizations are presented in Figures 2a, and 2b.

Figure 2a Figure 2a

Figure 2b Figure 2b

Figure 2. A closer inspection of temperature and inflow conductivity of the wastewater sludge. Figure 2a shows steady values on the upper half of the heatmap and a considerable variation on the lower half. Figure 2b shows a rapid transition from higher to lower conductivity values. The shift happens roughly after one-third of the sampling period.

Provided with more granularity, we can now see a detailed picture of each property. Figure 2a shows a steady increase in temperature that peaks in September of 2011 and then slightly decreases until the start of December 2011. Then we have a rapid decrease in temperature that lasts until March 2012 followed by a rapid increase that peaks in the beginning of April 2012 with temperatures going as high as 29.62°C. We can easily inspect values by using the tooltip that appears on the mouse hover over the cells of interest. On the other hand, Figure 2b shows high but steady values from the beginning of sampling and up until the last week of August 2011. After that, we have a swift decrease and stabilization of values that continues throughout the time series.

3.1.2. Metaproteomics data

Integrated meta-omics approaches hold the potential to resolve niches of microbial populations in situ. Therefore, we now shift our focus to metaproteomics data to identify microbial clusters, if there are any, using only raw FASTA data. To ascertain that, we selected Metaproteomics data, and then Raw FASTA files. When MOVIS completed embedding FASTA files, we selected K-Means clustering method and using the Elbow rule chart we selected three as a number of clusters (centroids) for our clustering method. Then we inspected the evaluation window of the selected clustering method. With a silhouette score of 0.495, we acknowledge that our method was successful, which is further corroborated by other available evaluation scores. Then we chose to visualize our data using two different dimensionality reduction techniques in order to determine which one gives a better visual outcome. The selection of PCA visualization and MDS visualization resulted in Figures 3a, and 3b respectively.

Figure 3a Figure 3a

Figure 3b Figure 3b

Figure 3. Clustered FASTA embeddings of the metaproteomics data set. Figure 3a used the PCA dimensionality reduction technique to visualize embedded data, while Figure 3b used the MDS technique.

Both figures provide us with a visual way to evaluate chosen clustering method. As can be seen on both Figures 3a, and 3b, their upper-left corner shows a mixture of class-0 (circles) and class-2 (triangles), which shows that K-Means had problems with clustering embeddings that occupy that space. However, the clustering was successful since most embeddings were placed in visually-appropriate clusters. Inspecting the color gradient of the visualization marks allows us to discover even more - samples clustered in the class-1 (rectangles) come in majority from the later time of the sampling period. The same can also be said for the class-2 samples, while class-0 samples come from a more dispersed sampling period. Mouse-hovering over each sample allows us to determine the exact time that sample was collected. MOVIS also supports calculating amino-acid based physico-chemical properties of the metaproteomics data set, which could uncover an even more detailed picture of the underlying phenomena. For the sake of brevity, we did not select that option.

3.1.3. Metabolomics data

Since a significant shift in substrates of the influent wastewater sludge can alter the community composition, we moved our focus to the Metabolomics data set, and more specifically the Processed data set 2. The selected data set is of composite nature, which means that it contains multi-omic information. Almost 95% of the data set represents metabolomics data, and the rest is physico-chemical data. Pre-combining omics data in such a fashion allows MOVIS to tap into the integrative aspect of the multi-omics nature. That aspect is planned but not yet directly available in MOVIS. Next, we selected Time heatmap visualization and chose feature named value as a quantitative color feature, param as a y-axis feature, and Diverging for the color scheme. Our selection resulted in Figure 4. Further inspection of Figure 4 reveals substrate shift happening from early to mid-November 2011 and early to mid-December 2011, with noticeably higher values in between. The substrate shift was defined by higher values of mainly non-polar metabolites, as well as polar metabolites, among which are putrescine and various disaccharides. After the end of December 2011, substrate levels normalized, and the community transitioned back to the pre-disturbance state.

Figure 4 Figure 4

Figure 4. Metabolite and physico-chemical values over time. A major shift of multiple parameters can be clearly observed around November 2011. Important abbreviation: bnp - intracellular nonpolar metabolites, bp - intracellular polar metabolites, ratio - metabolite intracellular/extracellular ratio, snp - extracellular nonpolar metabolites, sp - extracellular polar metabolites.

3.1.4. Metagenomics data

One way of estimating population abundance is by using metagenomic depth-of-coverage. Since MOVIS is not explicitly designed to work with meta-omics data, no taxa linking is currently enabled. However, by inspecting average depth-of-coverage values, we could get some insights into the overall population dynamics over time. Therefore, we selected Metagenomics data, and then Depth-of-coverage, which presented us with a directory hierarchy of the underlying data set. After MOVIS automatically calculated important statistical values of the data set in use, we visualized results using Whisker plot. The visualization mentioned earlier can be seen in Figure 5.

Figure 5 Figure 5

Figure 5. Metagenomics depth-of-coverage over time. A sinusoid wave formed by third quartile and upper limit values can be observed.

The third quartile (Q3) and upper limits form a sinusoid with a period of around one month, with a slight discrepancy around the beginning of November 2011. The discrepancy is caused by the increase of outliers (not shown in Figure 5) while calculating statistical values. We then visualized Mean depth-of-coverage values using Feature through time visualization, as shown in Figure 6. However, that did not provide us with any new insight.

Figure 6 Figure 6

Figure 6. Metagenomics mean depth-of-coverage over time. Mean metagenomics depth-of-coverage values are visualized using Feature through time visualization. The periodic nature of the underlying data is not clearly visible in this figure.

3.2. Conclusion

The simultaneous exploration of multi-metaomics data sets using MOVIS allowed us to uncover temporal patterns and discrepancies of one metaomic data set and efficiently connect them with other metaomic data sets if there were connections to be made. Furthermore, a swift visualization of sizable time-series multi-modal data sets revealed significant microbial clusters and temporal points of interest. Considering that, we are now empowered further to analyze temporal points of interest with metaomic-specific tools and uncover metaomic-specific details.