Getting started

The service is currently available in beta-version for free. We would be grateful for any positive or negative feedback from you to make the service better. Please leave it using the Feedback button.

You can get overview of all features and see sample analytical reports for several published datasets. To view them, please login in demo mode using the following credentials: login, password knomicsdemo (view-only).

To analyze your own metagenomes, please complete the following steps (Note: currently only 16S rRNA sequencing data format is supported).

  1. Register an account through “Sign Up” section.
  2. Sign in and create a new project in your account. A single project is intended to contain all metagenomes from one study.
  3. Open the project on Projects page and upload your metagenomic datasets. Available input formats are FASTQ and FASTA (or their archives). The reads should be uploaded in a “one file-one sample” way. You can run Basic report and External comparison reports now.
  4. If you want to analyze the links between microbiota composition with various factors (in Case-control, Factor analysis or Meta-analysis reports), please upload a metadata file in delimited values format (details are below). The file should be uploaded through the same section as metagenomic data.
  5. Run one or several of the desired analyses.
  6. Analysis status (Processing, Scheduled, Ready or Error) is displayed at the Project page.
  7. After the analysis is complete, you can view its results as an interactive report.
  8. If you want to share the report, press the “Share report” button and send the public link to anyone. You can also press the same button to cancel the sharing.
  9. If the analysis produces Error status and the message is not self-explanatory, please contact us using the Feedback button. We will react as soon as possible.

Report description:

Basic report

Basic report is produced essentially in any analysis. It includes raw data quality estimation, taxonomy analysis: alpha-diversity and microbial relative abundance calculation. There are also sample hierarchical clustering, taxa co-occurrence analysis (Kurtz et al., 2015), enterotyping (Arumugam et al., 2011) and metabolic pathway reconstruction, specifically vitamin and short fatty acid biosynthesis.

External comparison report

External comparison report is much similar to the basic report, though it includes not only user’s data, but also external metagenomic datasets. This allows to visualize the data in a world context.

Factor analysis report

Factor analysis report is available if metadata is provided. Knomics biota produces multifactor analysis there in order to reveal dependencies between microbiome content and sample descriptions provided. Particularly, differences in general composition and separate microbiome features between sample groups are studied.

Meta-analysis report

Meta-analysis report is a combination of External comparison report and Factor analysis. Specifically, it generates multi-factor analysis of both external and user data, using common metadata factors.

Case-control report

Case-control report is available for specific studies, involving non-paired samples, separated into two groups. For example, it can be a study of some probiotic product’s influence on gut microbiome vs placebo, or research of its influence on infants and adults.

Paired analysis report

Paired analysis report is a more specific version of the case-control report, when analysed samples are paired: study before and after some intervention on the same subjects.

Preparing your metadata file

Metadata - additional information about each sample (age, body-mass index, clinical status, etc.) - should be uploaded as a delimited text file, with each column corresponding to a single factor and each row - to a single row. The delimiter can be comma, tab or custom symbol.

Generally, the metadata file should contain at least 1 column - IDs of samples - with reserved name “sample”. Each name should be the same as the filename of the respective readset without extension. However, if this word ends with _R1 or _R2 (typical for output of Illumina sequencer), these should be excluded from sample IDs in metadata. During analysis, all “_” symbols will be substituted with “.” symbols. Examples of filnames and expected sample names in the metadata file: SRS1234.fastq.gz - SRS1234, SRS5678_R1.fastq.gz - SRS5678.

If you want to analyze 2 groups of samples within your data (Case-Control analysis), the metadata should include at least 2 columns: sample IDs and information about membership of each sample in case or control group (in a column with reserved name “case_control”). The latter column can include one of the two values: “case” or “control”.

If you want to analyze your metagenomes and metadata in comparison with external metagenomes and metadata (Meta-analysis), it is possible to use the following factors (and reserved column names) provided for all external datasets:

  • sample - sample ID
  • age - age of the subject, in years
  • gender - M or F
  • bmi - body-mass index
  • icd10 - disease of the subject, according to the International Classification of Diseases (e.g., G06.1, A06.9) If subjects are healthy, a term "healthy" should be used.
  • country - for example, Russia, USA
  • case_control - as described above (can be equal to one of two values - case or control)
  • antibiotics - logical value indicating whether subject had recent undergone antibiotic therapy (can be equal to “true” or “false”)

Removal of contamination using negative control

You can remove the reads likely corresponding to technical contamination if you have negative control (NC) sample(s) sequenced for your project. Follow these steps:

1. Obtaining the list of contaminant sequences

1.1 Create a new project. Upload the read files of the microbiome and the NC.

1.2 Run a Basic Report (without uploading the list of contaminants).

1.3 Upload a specific meta-data file containing 1 factor that specifies whether each sample ID is an NC or not. An? character values, such as “yes”/”no” may be used in the column. Specify “sample_is_negative_control“ factor type for the column while configuring the metadata and choose “True” for samples which are NC, and “False” - for those that are not.

1.4 Run a Contamination Report. It will present a summary about the prevalence of likely contaminants in the samples - based on NC sample(s) composition along with the custom compiled list of several common laboratory contaminants.

1.5 Basing on the compositions provided in this report, compile a manually checked list of contaminants - as a text file where each string is a contaminant sequence to be removed from all samples. Noteworthy, depending on the choice of niche it might be reasonable to avoid discarding certain sequences (for example, soil microbiome might in fact be enriched in the taxa frequently reported as lab contaminants.)

2. Data analysis with contaminant sequences removed:

2.1 Create a new project, upload the reads (inclusion of the NC sample(s) is not mandatory).

2.3 Before running a Basic Report, upload the list of contaminants using the button near the “Negative control cleaning” label.

2.4 Run a Basic Report.

Now you are all set to run any downstream analyses on cleaned data.

Please note: it is not possible to modify the list of contaminants after running the Basic Report; to do it, you will have to create a new project.