diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..87620ac --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.ipynb_checkpoints/ diff --git a/Submodule 0/Submodule_0_Tutorial_1_GithubDownload.ipynb b/Submodule 0/Submodule_0_Tutorial_1_GithubDownload.ipynb index b5b133c..15f3e12 100755 --- a/Submodule 0/Submodule_0_Tutorial_1_GithubDownload.ipynb +++ b/Submodule 0/Submodule_0_Tutorial_1_GithubDownload.ipynb @@ -9,11 +9,6 @@ "--------------------------------------------" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "metadata": {}, @@ -120,7 +115,7 @@ "\n", "Or, perhaps you are going to use more of this module (Introduction to Github and Python for Bioinformatics). \n", "\n", - "You should click on the link to that module from the list. It will take you to another Github Repository. Of course, the interfaces will also start with the file structure as all Github repositories look the same. These Modules typically contain a Readme.md file that explains the structure & contents of the overall tutorial module. Inside of each Module are folders (\"Submodules\") that contain the lessons in the form of Jupyter Notebooks (next lesson in this list)\n", + "You should click on the link to that module from the list. It will take you to another Github Repository. Of course, the interfaces will also start with the file structure as all Github repositories look the same. These modules typically contain a README.md file that explains the structure & contents of the overall tutorial module. Inside of each module are folders (\"Submodules\") that contain the lessons in the form of Jupyter Notebooks (next lesson in this list)\n", "\n", "
Attention: Please go to that module in another tab on your browser before following the next set of directions.
\n" ] @@ -136,7 +131,7 @@ "
\n", "So, in order to use the contents of any of the learning modules in the Sandbox, you need to copy them to a \"virtual machine\" (like moving to the hard drive of a computer running in GCP, Azure, or AWS). Github is quite prepared for this, in fact, it is one key purpose of the NIGMS using Github as a place for you to collect the materials.\n", "
\n", - "There isn't a normal download button that you can see in a repository. At Github, most of the materials are code that developers want to download. Thus, you will click on the green code button to get the #option# to download or to copy.\n", + "There isn't a normal download button that you can see in a repository. At Github, most of the materials are code that developers want to download. Thus, you will click on the green code button to get the **option** to download or to copy.\n", "
\n", "![<> CODE button](./images/Code_button.png)\n", "
\n", @@ -179,7 +174,7 @@ "
\n", "1. Click on the copy button next to the URL.\n", "2. Go to the Azure terminal\n", - "3. type: git clone *then paste the url you copied*\n", + "3. type: `git clone` then paste the url you copied. \n", "\n", "
\n", "The whole set of folders and proper file structure will instantly be in your notebook list.\n" @@ -193,7 +188,7 @@ "\n", "## Familiar, inelegant version: Downloading the contents of the repository then uploading\n", "\n", - "There isn't a normal download button that you can see in a repository. At Github, most of the materials are code that developers want to download. Thus, you will click on the green code button to get the #option# to download or to copy.\n", + "There isn't a normal download button that you can see in a repository. At Github, most of the materials are code that developers want to download. Thus, you will click on the green code button to get the **option** to download or to copy.\n", "
\n", "![<> CODE button](./images/Code_button.png)\n", "\n", @@ -205,7 +200,7 @@ "\n", "# \"Unzipping\" your folder\n", "\n", - "The folder of materials does not download in a ready-to-use format. It is compressed (\"zipped\") so it needs to be extracted| into its full form before you upload it to use as a tutorial.\n", + "The folder of materials does not download in a ready-to-use format. It is compressed (\"zipped\") so it needs to be extracted into its full form before you upload it to use as a tutorial.\n", "\n", "**To unzip a file on Windows:**\n", "\n", @@ -217,7 +212,7 @@ "![Unzip](./images/extractAll.png)\n", "\n", "
\n", - "4. Choose the destination folder where you want to extract the folder. (it is fine to extract them in the same folder the file is already)\n", + "4. Choose the destination folder where you want to extract the folder (it is fine to extract them in the same folder the file is already).\n", "5. Click \"Extract\" to start the unzipping process. \n", "\n", "
\n", @@ -260,9 +255,9 @@ ], "metadata": { "kernelspec": { - "display_name": "conda_python3", + "display_name": "Python 3 (ipykernel)", "language": "python", - "name": "conda_python3" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -274,7 +269,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.16" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule 0/Submodule_0_Tutorial_2_JupyterNotebooks.ipynb b/Submodule 0/Submodule_0_Tutorial_2_JupyterNotebooks.ipynb index 80d3fac..726f6d8 100755 --- a/Submodule 0/Submodule_0_Tutorial_2_JupyterNotebooks.ipynb +++ b/Submodule 0/Submodule_0_Tutorial_2_JupyterNotebooks.ipynb @@ -51,7 +51,7 @@ "\n", "**What is a Jupyter Notebook?**\n", "\n", - "It is a tool that lets you write text and run computer code (typically R or Python) all mixed into a single file, making it easy to see and understand. That is, Instead of just writing programming code in one tool and the descriptions or explanations in another file or a textbook, it blends them together sequentially.\n", + "It is a tool that lets you **write text** and **run computer code** (typically R or Python) all mixed into a single file, making it easy to see and understand. That is, Instead of just writing programming code in one tool and the descriptions or explanations in another file or a textbook, it blends them together sequentially.\n", "
\n", "Also, Jupyter lets you type that computing code into boxes (called \"cells\") and run each part separately. \n", "
\n", @@ -84,17 +84,19 @@ "![image.png](attachment:90fea28a-e4ae-4a89-a13a-280dbd0819fb.png)\n", "\n", "
\n", - "The top line is clearly text-based (the tool for that in Jupyter is called \"Markdown\")\n", + "The top line is text-based (the tool for that in Jupyter is called \"Markdown\")\n", "\n", "The next cell below is a **code cell** that has in it: \n", + "```\n", "#this is a code cell\n", "print(\"This is the output from a code cell\")\n", - "
\n", - "Finally, the bottom material is not in a cell but is the output (This is the output from a code cell)\n", - "
\n", - "The kind of code it is able to run is **Python**. If you look at the top right, you can see that this Jupyter Notebook is running Python 3 (Python language version 3) in the code cells. \n", + "```\n", + "Below this cell, the output redered by running the cell can be found. \n", + "\n", + "The code cells are able to run Python code. If you look at the top right, you can see that this Jupyter Notebook is running Python 3 (Python language version 3) in the code cells. \n", "\n", "In the above image the code in the cells has already been run (Markdown is also a kind of code that renders the text in some format, so we \"run\" that too) Take a look at the same notebook before the two cells were executed:\n", + "\n", "![image.png](attachment:cc6d0953-136b-4038-a6fb-0992b25aec95.png)\n", "\n" ] @@ -107,7 +109,7 @@ "\n", "Jupyter Notebooks are useful because:\n", "\n", - "1. They are easy to Use – You don’t need to run an entire program at once. You can run small pieces of code step by step, see what happens, and fix mistakes as you go.\n", + "1. They are easy to use – You don’t need to run an entire program at once. You can run small pieces of code step by step, see what happens, and fix mistakes as you go.\n", "2. You can take notes while you code – Instead of keeping your notes in a separate document, you can write explanations right next to your code to make it easier to understand later. In this module, it means that notes can be given to you to describe the code.\n", "3. They help you to see your results clearly & immediately – If your code creates a table, chart, or graph, Jupyter Notebook will show it right inside the document.\n", "4. It's great for learning and sharing – You can save your notebook and send it to someone else so they can see your work and try it themselves.\n", @@ -133,7 +135,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To run a single code cell in Azure:\n", + "To **run** a single code cell in Azure:\n", "\n", "- Click inside the cell to select it.\n", "- Click the Run button (right-pointing triangle for 'play' in the toolbar)\n", @@ -186,7 +188,7 @@ "source": [ "# Viewing a Jupyter Notebook in Azure\n", "\n", - "To use a notebook, you'll need to OPEN a notebook. In Azure, after you've loaded in some notebooks, you need to open the folder, then double click on a notebook so it will open. IN the image below, the arrow shows where you can find the list of notebooks. Circled is the button to expand the notebook & shrink the file structure. \n", + "To use a notebook, you'll need to OPEN a notebook. In Azure, after you've loaded in some notebooks, you need to open the folder, then double click on a notebook so it will open. In the image below, the arrow shows where you can find the list of notebooks. Circled is the button to expand the notebook & shrink the file structure. \n", "\n", "![opening_notebook.png](./images/opening_notebook.png)" ] @@ -220,7 +222,7 @@ "\n", "# Adding code or markdown boxes\n", "\n", - "You can create a code box or a text (Markdown) box very easily.\n", + "You can create a code box or a text (markdown) box very easily.\n", "
\n", "In Azure, to add another box hover over the space above the left side of a current box, and the following options will appear:\n", "\n", @@ -247,13 +249,6 @@ "## Clean up\n", " In the main tutorials, you'll be reminded here to \"shut down your compute instance.\" Check out the Azure tutorial for more explaination." ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { @@ -272,7 +267,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule 0/Submodule_0_Tutorial_3_AzureML.ipynb b/Submodule 0/Submodule_0_Tutorial_3_AzureML.ipynb index 543c40f..b01a502 100755 --- a/Submodule 0/Submodule_0_Tutorial_3_AzureML.ipynb +++ b/Submodule 0/Submodule_0_Tutorial_3_AzureML.ipynb @@ -14,7 +14,7 @@ "metadata": {}, "source": [ "## Overview\n", - "This tutorial is intended to get you started on Azure ML with a \"compute instance.\"\n", + "This tutorial is intended to get you started on Azure ML with a **compute instance.**\n", "\n", "The process is not (novice) user-friendly *the first time.* You will have to set up a few things (accounts, subscriptions, workspaces) but these will be your defaults after the first time. It will thus be really easy to go back to or to use AzureML for other cloud computing modules provided at the NIGMS Sandbox. \n", "\n", @@ -94,7 +94,7 @@ "Cost Details:\n", "- Azure will provide the first $200 worth of charges, which is an enormous amount of computing if you are just learning and running notebooks\n", "- The cost is low (~ 15cents/hour for the smallest virtual computer)\n", - "- You will need a credit card, so they could charge you for use of computingafter the trial period has expired.\n", + "- You will need a credit card, so they could charge you for use of computing after the trial period has expired.\n", "\n", "### To create the subscription\n", "To create an Azure Machine Learning (AzureML) subscription where you can run the Jupyter Notebook tutorials, you need to first create a regular Azure subscription if you don't already have one. \n", @@ -117,9 +117,7 @@ "|**Student (Azure for Students)**| **Free $100 credit (no credit card needed) +free services**|**Verified students (with .edu email)**|\n", "|NIH Subscription|For CloudLab|See below|\n", "\n", - "4. Provide a credit card for billing that subscription, if applicable\n", - "\n", - "-------------------------------------------------------------------------------------" + "4. Provide a credit card for billing that subscription, if applicable.\n" ] }, { @@ -148,7 +146,7 @@ "source": [ "# Step 3: Create a workspace\n", "\n", - "AzureML requires that you do all of your cloud computing in a \"workspace. They imagine that you might have several distinct projects. You *can* run all of your NIH tutorials in a single workspace.\n", + "AzureML requires that you do all of your cloud computing in a workspace. They imagine that you might have several distinct projects. You *can* run all of your NIH tutorials in a single workspace.\n", "\n", "To create your workspace, select this button from the home screen of a different part of Azure: [Azure ML](https://ml.azure.com/)\n", "\n", @@ -200,7 +198,7 @@ "source": [ "## Step 4: Open a Jupyter notebook\n", "\n", - "In Azure, after you've loaded in some notebooks, you need to open the folder, then double click on a notebook so it will open. IN the image below, the arrow shows where you can find the list of notebooks. Circled is the button to expand the notebook & shrink the file structure. \n", + "In Azure, after you've loaded in some notebooks, you need to open the folder, then double click on a notebook so it will open. In the image below, the arrow shows where you can find the list of notebooks. Circled is the button to expand the notebook & shrink the file structure. \n", "\n", "![opening_notebook.png](./images/opening_notebook.png)\n", "\n", @@ -277,8 +275,7 @@ "metadata": {}, "source": [ "## Clean up\n", - "
Attention: To avoid unnecessary charges, please STOP your compute instance if you started one.
\n", - "(you did not NEED a compute instance to run anything on this page)" + "
Attention: To avoid unnecessary charges, please STOP your compute instance if you started one.
\n" ] } ], @@ -298,7 +295,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule 0/Submodule_0_Tutorial_3b_AzureML_CloudLab.ipynb b/Submodule 0/Submodule_0_Tutorial_3b_AzureML_CloudLab.ipynb index 3e87a2a..fc3b400 100755 --- a/Submodule 0/Submodule_0_Tutorial_3b_AzureML_CloudLab.ipynb +++ b/Submodule 0/Submodule_0_Tutorial_3b_AzureML_CloudLab.ipynb @@ -48,7 +48,7 @@ "source": [ "# Step 2: Using Azure\n", "\n", - "You should be able to return to the [AzureML module](Submodule_0_Tutoral_3_AzureML.ipynb), starting at step 3, to make a workspace and use your NIH subscription. The latter is what is paying for your cloud *computing* so you must have a subsription to select. \n" + "You should be able to return to the [AzureML module](Submodule_0_Tutoral_3_AzureML.ipynb), starting at step 3, to make a workspace and use your NIH subscription. The latter is what is paying for your cloud computing so you must have a subsription to select. \n" ] }, { @@ -57,8 +57,16 @@ "metadata": {}, "source": [ "# Conclusion\n", - "This mini-tutorial was designed to direct you to the NIH CloudLab information. You do not have to use CloudLab; you can use pay-as-you-go on Azure and other Cloud Computing platforms. Regardless of what tool you use, be sure to **always turn off the computer (\"compute instance\")** when you leave. Even tens-of-cents an hour can add up if you forget it's running." + "This mini-tutorial was designed to direct you to the NIH CloudLab information. You do not have to use CloudLab; you can use pay-as-you-go on Azure and other cloud Ccmputing platforms. Regardless of what tool you use, be sure to **always turn off the computer (\"compute instance\")** when you leave. Even tens-of-cents an hour can add up if you forget it's running." ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3a43538-9d16-47b0-9bbb-855675d125cb", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -77,7 +85,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule 0/Submodule_0_Tutorial_4_GitHub4You.md b/Submodule 0/Submodule_0_Tutorial_4_GitHub4You.md index 5106ce4..01e2982 100755 --- a/Submodule 0/Submodule_0_Tutorial_4_GitHub4You.md +++ b/Submodule 0/Submodule_0_Tutorial_4_GitHub4You.md @@ -33,8 +33,6 @@ By the end of this lesson, you will be able to: ## Prerequisites None -------------------------------------------- - # FAIR Data principles You may already have seen that the NIH requires researchers to share their data in ways that make it FAIR to maximize its value. Ensuring data is well-documented, openly available, and in standardized formats not only enhances transparency and collaboration but also aligns with NIH’s commitment to advancing scientific discovery. @@ -119,9 +117,9 @@ Before you can start using GitHub for your materials, you need to create an acco To get started, you need to sign up for a free GitHub account. This will give you access to your own profile, repositories, and collaboration tools. Follow the steps below to create your GitHub account. - Go to GitHub's website -- Click on Sign up in the top-right corner. +- Click on 'Sign up' in the top-right corner. - Enter your email address, username, and password. -- Click Create an account and follow the instructions. +- Click 'Create an account' and follow the instructions. - GitHub will send a verification email. Click the link in the email to verify your account.

@@ -156,12 +154,11 @@ A repository (A "repo") is like a folder where you store your research data and ### Instructions 1. Open GitHub Desktop and click “File” → “New Repository”. - ![NewRepository](./images/github_new_repository.png) 2. Give your repository a name (e.g., "Climate_Data_Study_2024"). 3. Choose a location **on your computer** where the repository will be stored. -4. Select Private (if you are using it for your lab group). You can name "collaborators" later (students, postdocs, etc) -5. Check “Initialize this repository with a README” (important for documenting your dataset). This is the appropriate spot to include summary information about this particular repository's purpose +4. Select 'Private' (if you are using it for your lab group). You can name **collaborators** later (students, postdocs, etc). +5. Check “Initialize this repository with a README” (important for documenting your dataset). This is the appropriate spot to include summary information about this particular repository's purpose. 6. Click Create Repository.

@@ -201,8 +198,8 @@ Once your repository is set up, you can start adding data files like Excel, CSV, ![CommitMessage](./images/commit_msg.png) -5. Click the bottom "Commit to _____" button (this saves the version to your local repository). -6. Click Push to Origin (this uploads your data to GitHub.com). +5. Click the bottom 'Commit to _____' button (this saves the version to your local repository). +6. Click 'Push to Origin' (this uploads your data to GitHub.com). ![PushImage](./images/push_origin.png) @@ -227,7 +224,7 @@ Using GitHub Desktop, you can track protocol changes alongside your datasets. 2. Edit or add a new protocol document (e.g., data_collection_protocol_v2.docx). 3. Open GitHub Desktop, and you’ll see the updated file. 4. Write a commit message (e.g., "Updated protocol to include new sensor calibration process") as above for data -5. Click Commit to main, then Push to Origin. +5. Click 'Commit' to main, then 'Push to Origin'. Now, every protocol update is documented and timestamped, ensuring full transparency. diff --git a/Submodule 0/Submodule_0_Tutorial_5_ManagingGit.md b/Submodule 0/Submodule_0_Tutorial_5_ManagingGit.md index 088edcd..1b4822e 100755 --- a/Submodule 0/Submodule_0_Tutorial_5_ManagingGit.md +++ b/Submodule 0/Submodule_0_Tutorial_5_ManagingGit.md @@ -4,7 +4,7 @@ This tutorial helps research PIs and project leads set up GitHub to manage lab data with clarity and control. You'll learn how to protect the main branch, require review of changes, and maintain traceability across your team’s contributions. ## Learning Objectives -In this tutorial, you will strengthen your abilities to: +In this tutorial, you will strengthen your ability to: - Explain the risks of uncontrolled data changes in research labs - Define key GitHub terms like commits, branches, pull requests, and main - Set up branch protection to safeguard official data and protocols @@ -13,7 +13,7 @@ In this tutorial, you will strengthen your abilities to: - Apply Git/GitHub tools to support data integrity and FAIR principles ## Prerequisites -Please complete tutorial 4 before tutorial 5. +Please complete Tutorial 4 before Tutorial 5. -------------------------------------------------------------- ## Why use Git for your research lab team *DATA?* @@ -105,7 +105,7 @@ main (or master, depending on your repo) ✅ Restrict who can push to matching branches (You can list yourself or other senior members who are allowed to approve changes) -Click Create or Save changes. +Click 'Create' or 'Save' changes. 👥 What This Looks Like for Your Collaborators Now, your students and research assistants won’t be able to push changes directly to the official version. Instead, they will: @@ -133,7 +133,7 @@ As the lab PI, you now have: ## Conclusion -You should now be able to use Github and the underlying tool of Git to keep track of and protect your data. However, it is not yet FAIR data because you need a fixed identifier for a set of data (even if you later update it) to inlcude with your journal articles. +You should now be able to use Github and the underlying tool of Git to keep track of and protect your data. However, it is not yet FAIR data because you need a fixed identifier for a set of data (even if you later update it) to include with your journal articles. For that information, view the [last tutorial on Git](Submodule_0_Tutorial_6_DOI.md) in this submodule. diff --git a/Submodule_1/Submodule_1_Tutorial1_PythonOverview.ipynb b/Submodule_1/Submodule_1_Tutorial1_PythonOverview.ipynb index 87be95c..8bcc85b 100755 --- a/Submodule_1/Submodule_1_Tutorial1_PythonOverview.ipynb +++ b/Submodule_1/Submodule_1_Tutorial1_PythonOverview.ipynb @@ -25,10 +25,10 @@ "- Find help for Python Tools\n", "- Use functions and methods on variables (at a beginner level)\n", "## Prerequisites\n", - "NONE\n", + "* No prior experience or setup required\n", "\n", "## Getting Started\n", - "Run the next code box(\"play\" button to the left or on the toolbar above) to installall the needed libraries for runningcode on this pagey" + "Run the next code cell (using the \"Play\" button to the left or from the toolbar above) to install all the necessary libraries for running the code on this page." ] }, { @@ -50,7 +50,7 @@ "source": [ "# Python Commands & Programs\n", "\n", - "We want to use Python to do some job. Thus, we create lines of code which, when run \"do something.\" This is more or less what we think of as a program\n", + "We want to use Python to do some job. Thus, we create lines of code which, when run \"do something.\" This is more or less what we think of as a program.\n", "
\n", "In Python, as in other languages, you build a program by creating these lines of code, often called commands, in a syntactically correct and logically organized way. We must do this because machines are dumb and will only do what we explicity tell them to do. \n", "
\n", @@ -73,7 +73,7 @@ "id": "5b3efa13-65aa-4d70-8db5-3134ee82c34d", "metadata": {}, "source": [ - "The Python compiler (the interpreter of code into something that executes) interprets a new line as a new commmand. It also interprets indentations as meaningful. You will see over time that indentation must align.\n", + "The Python compiler (the interpreter of code into something that executes) interprets a new line as a new command. It also interprets indentations as meaningful. You will see over time that indentation must align.\n", "
\n", "
You can tell the compiler to ignore material in the coding box by adding a # in front. Try this in the box above, adding # before print.\n", "\n", @@ -87,7 +87,7 @@ "id": "398865a6-a1a0-48ac-9a76-0af145f35a4f", "metadata": {}, "source": [ - "Python is an object-oriented language. This approach means that you do not have to upload every possible tool in the language to run each program. Rather, you pull in just the pieces you need. \n", + "Python is an **object-oriented language.** This approach means that you do not have to upload every possible tool in the language to run each program. Rather, you pull in just the pieces you need. \n", "
\n", "In contrast, software like Excel is ready for ANYTHING you might ask it to do. Unfortunately, loading EVERY tool just fills up the computer's memory and slows processes.\n", "
\n", @@ -207,12 +207,12 @@ "source": [ "## Conclusion \n", "In this lesson, you learned:\n", - "- Programs can be short lines of just text\n", - "- Comments are created with #\n", - "- Functions typically receive arguments in parenthesis and act on them, while method are actions that can be done on certain kinds of data\n", - "- To find out more background, write help(function)\n", - "- To edit functions or methods and run them on your own\n", - "- Python can import other functionalities (or not) to maximize performance efficiency\n", + "- Programs can be short lines of just text.\n", + "- Comments are created with #.\n", + "- Functions typically receive arguments in parentheses and act on them, while methods are actions that can be done on certain kinds of data.\n", + "- To find out more background, you can use the help(function).\n", + "- To edit functions or methods and run them on your own.\n", + "- Python can import other functionalities (or not) to maximize performance efficiency.\n", "\n", "Time to go on to learn about [variables!](./Submodule_1_Tutorial2_Variables.ipynb)" ] @@ -243,7 +243,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_1/Submodule_1_Tutorial2_Variables.ipynb b/Submodule_1/Submodule_1_Tutorial2_Variables.ipynb index 0fc8799..d30226f 100755 --- a/Submodule_1/Submodule_1_Tutorial2_Variables.ipynb +++ b/Submodule_1/Submodule_1_Tutorial2_Variables.ipynb @@ -34,14 +34,13 @@ "source": [ "## Learning Objectives\n", "After this session, you will be able to :\n", - "- Recognize Python's rules for naming and using variables\n", - "- Recognize that variables can hold different data types which are dynamically typed (you don’t need to declare the type of a variable)\n", - "- Use basic string functions\n", - "- Allow you to capture user input to a variable\n", + "- Recognize Python's rules for naming and using variables.\n", + "- Recognize that variables can hold different data types which are dynamically typed (you don’t need to declare the type of a variable).\n", + "- Use basic string functions.\n", + "- Allow you to capture user input to a variable.\n", "\n", "## Prerequisites\n", - "\n", - "- Intro to python (tutorial 1)\n", + "- Submodule 1- Tutorial 1: Python Overview\n", " \n", "## Getting Started\n", "As will be true in every tutorial, please \"run\" the next code box to install needed packages for (in this case) the quizzes" @@ -539,7 +538,7 @@ "Strings have some other special properties in Python.\n", "
\n", "
\n", - "Strings are also a sequence object, so are iterable. But, they cannot have characters substituted (they are \"immutable\")" + "Strings are also a sequence object, so are iterable. But, they cannot have characters substituted. This means they are \"immutable\"." ] }, { @@ -619,7 +618,7 @@ "
\n", "
\n", "To do this, use the str() function to convert a number variable to a string or a string to a number with the int() function:\n", - "

Tip: Try this: remove the str() around X in the \"Sarah is\" statement to see what happens if you treat an integer like a string WITHOUT converting it
" + "
Tip: Try this: remove the str() around X in the \"Sarah is\" statement to see what happens if you treat an integer like a string WITHOUT converting it.
" ] }, { @@ -658,8 +657,8 @@ "- How to display characters which you can't see (like tabs and spaces)\n", "- How to display a quotation mark\n", "
\n", - "
\n", - "For this, we use the escape character \\ (backslash)." + "\n", + "For this, we use the escape character \"\\\" (backslash)." ] }, { @@ -680,9 +679,9 @@ "\n", "\n", "
\n", - "The official Python documentation for use of the escape character: https://docs.python.org/3/reference/lexical_analysis.html#literals\n", + "The official Python documentation for use of the escape character: [link](https://docs.python.org/3/reference/lexical_analysis.html#literals)\n", "
\n", - "Try running the examples below which help you to deal with the requirement in Windows to use backslashes to designate subfolders:" + "Try running the examples below — they will help you deal with the requirement in **Windows** to use backslashes (`\\`) when specifying subfolders." ] }, { @@ -747,8 +746,7 @@ "We can add, divide, concatenate, assign values and many more things.\n", "
\n", "
\n", - "It’s important to understand that operators are \"polymorphic\".\n", - "- That is, they behave differently depending upon what it is they are operating on.\n", + "It’s important to understand that operators are \"polymorphic\". That is, they behave differently depending upon what it is they are operating on.\n", "
\n", "For example, the **+** operator adds numbers but concatenates strings." ] @@ -784,8 +782,7 @@ "print(\"Welcome, \", player1)\n", "\n", "# Square a Number. Again, even though we are asking for a number,\n", - "# the input will still be a string so we will have to convert\n", - "# using what is called a \"cast\" (see below).\n", + "# the input will still be a string so we will have to convert using what is called a \"cast\" (see below).\n", "# In this case, we are casting the string to a float (floating point number).\n", "num1 = input(\"Enter the number you would like me to square: \")\n", "num2 = float(num1)\n", @@ -801,9 +798,17 @@ "Now it’s your turn to apply what you have learned. \n", "\n", "A common unit of bioinformatics data is a FASTA DNA sequence. It looks like this:\n", - ">crab_anapl ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-CRYSTALLIN). \r\n", - "MDITIHNPLIRRPLFSWLAPSRIFDQIFGEHLQESELLPASPSLSPFLM\n", "\n", + "```\n", + ">crab_anapl ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-CRYSTALLIN). \r\n", + "MDITIHNPLIRRPLFSWLAPSRIFDQIFGEHLQESELLPASPSLSPFLM```R" + ] + }, + { + "cell_type": "markdown", + "id": "012aa05c-b112-4828-b8d2-e9234f56125e", + "metadata": {}, + "source": [ "The first line starts with a > then the name\n", "The second (and subsequent) line(s) have DNA, RNA, or protein sequences.R" ] @@ -817,10 +822,10 @@ "
\n", "\n", "\n", "\n", "The solution is at the end of this Jupyter notebook, after the wrap up." @@ -843,7 +848,7 @@ "source": [ "### Test Your Knowledge\n", "\n", - "Take the following quizzes to check your coding knowledge." + "Take the following quiz to check your coding knowledge." ] }, { @@ -865,9 +870,9 @@ "source": [ "# Conclusion \n", "\n", - "By now, you should have a basic grasp of some of the ways that variables can hold information that Python will process using scripts. .\n", + "By now, you should have a basic grasp of some of the ways that variables can hold information and be used in Python scripts.\n", "
\n", - "With that foundation, we will can look at more advanced [data structures](./Submodule_1_Tutorial3_DataStructures.ipynb) which will be necessary for Bioinformatics." + "With that foundation, we will look at more advanced [data structures](./Submodule_1_Tutorial3_DataStructures.ipynb)." ] }, { @@ -919,7 +924,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_1/Submodule_1_Tutorial3_DataStructures.ipynb b/Submodule_1/Submodule_1_Tutorial3_DataStructures.ipynb index b2843ef..3ad3c06 100755 --- a/Submodule_1/Submodule_1_Tutorial3_DataStructures.ipynb +++ b/Submodule_1/Submodule_1_Tutorial3_DataStructures.ipynb @@ -6,9 +6,10 @@ "metadata": {}, "source": [ "# Overview: Complex Data Structures\n", - "-------------------------------------------------------------------------------------------------\n", "\n", - "In this tutorial we will move past simple variables to explore more complex and mutable data types. We have already explored basic data types (string, integer). However, these basic types often fall short of the kinds of structures that would be helpful for Python programming in general and bioinformatics processing in particular. Two key data types are provided by Python: will be common to Python (list & dictionaries) while two classes will be imported from Biopython (Seq & SeqRecord)" + "In this tutorial, we will move beyond simple variables to explore more complex and mutable data types. We have already looked at basic data types such as strings and integers. However, these basic types often fall short of the kinds of structures that are useful for Python programming in general—and bioinformatics processing in particular.\n", + "\n", + "Python provides two key data types that are commonly used: lists and dictionaries. Additionally, we will import two important classes from Biopython: Seq and SeqRecord." ] }, { @@ -18,8 +19,8 @@ "source": [ "## Prerequisites\n", "" ] }, @@ -89,10 +90,10 @@ "
\n", "
\n", "The contents of a list can be:\n", - "+ edited (they are mutable)\n", - "+ concatenated using $ + $\n", - "+ nested\n", - "+ indexed\n", + "+ edited (they are mutable).\n", + "+ concatenated using \"+\".\n", + "+ nested.\n", + "+ indexed.\n", "+ members can be accessed using [] brackets. Counting starts at 0 but indexing can be positive or negative (as we saw with strings.)" ] }, @@ -172,10 +173,9 @@ "id": "60a241ff-9ea0-4597-852b-23166d37d261", "metadata": {}, "source": [ - "Lists can be aldo be \"nested.\" In other words, a list can contain other lists, which can contain still more lists (or other variable types), and so on. \n", + "Lists can be also be \"nested.\" In other words, a list can contain other lists, which can contain still more lists (or other variable types), and so on. \n", "
\n", - "We will, in the code box, create a list with two elements\n", - "Each element is a list with a pair of words\n", + "We will, in the code box, create a list with two elements. Each element is a list with a pair of words.\n", "\n", "
Tip: Try to add another two-part element nested_list.
" ] @@ -214,7 +214,7 @@ "id": "4edb6835-0dc3-4869-b66f-711f711c84be", "metadata": {}, "source": [ - "
We can append, insert, and remove list elements.​
" + "We can append, insert, and remove list elements." ] }, { @@ -251,7 +251,7 @@ "So one element ends up holding an entire list, rather than just one element.\n", "
\n", "
\n", - "To fix this problem, use **extend**, rather than **append**:" + "To fix this problem, use **extend**, rather than **append**." ] }, { @@ -279,15 +279,11 @@ "source": [ "# Exercise\n", "\n", - "Test your skill:\n", - "
\n", - "- Create a list using the numbers 1 through 5​\n", - "
\n", - "- Create a list using the letters a-e​\n", - "
\n", - "- Add the lists together and print the result​\n", - "
\n", - "- Try multiply operator *, what happens?​" + "Test your skill: Create a list using the numbers 1 through 5\n", + "\n", + "- Create a list using the letters a-e\n", + "- Add the lists together and print the result\n", + "- Try multiply operator *, what happens?" ] }, { @@ -305,9 +301,7 @@ "source": [ "## Slicing\n", "\n", - "Slicing is a useful way to get a portion of a string or of a list. In bioinformatics, we might want to look at the first 3 bases or amino acids in each string in a FASTA file. \n", - "
\n", - "Slicing has 3 arguments.\n", + "Slicing is a useful way to get a portion of a string or of a list. In bioinformatics, we might want to look at the first 3 bases or amino acids in each string in a FASTA file. Slicing has 3 arguments.\n", "\n", "- [ start : stop : increment ]\n", "\n", @@ -315,7 +309,7 @@ "\n", "With slicing, start/stop can be *negative*, impying count from the end.\n", "\n", - "Increment default value is 1, but it can be any integer (including negative)." + "The increment default value is 1, but it can be any integer (including negative)." ] }, { @@ -403,7 +397,7 @@ "metadata": {}, "source": [ "# Test Your Knowledge\n", - "Now it’s your turn to apply what you have learned in the following maatching quiz exercise. You **should feel free** to write the code in a Python cell if you cannot predict the outcome. " + "Now it’s your turn to apply what you have learned in the following matching quiz exercise. You **should feel free** to write the code in a Python cell if you cannot predict the outcome. " ] }, { @@ -432,18 +426,19 @@ "metadata": {}, "source": [ "In bioinformatics, you often need lookup tools or to access parts of a long set of data, without having to know the data's position in a list. The ideal tool for this is a python **dictionary**. \n", - "
\n", + "\n", + "\n", "In Python, a dictionary is a data structure that stores data in **key-value pairs.** As you can see in the next code box, an example is a dictionary for the genetic code.\n", "\n", "The basic properties of a dictionary are:\n", " + Key-Value Pairs: Each item in a dictionary consists of a unique key and its corresponding value. Think of it like a real-world dictionary where words are the keys and their definitions are the value. Or, for the genetic code, where AUG is the key and Met or Methionine is the value\n", " + Unordered: Unlike lists, dictionaries don't maintain any inherent order for the elements\n", - " + Mutabile: You can add, remove, or modify items in a dictionary after its created\n", + " + Mutable: You can add, remove, or modify items in a dictionary after its created\n", " + Accessing Values: You retrieve values by using their associated keys\n", " + Creating dictionaries: use curly braces {}\n", "
\n", "\n", - "Let's examine a few common examples" + "Let's examine a few common examples." ] }, { @@ -485,7 +480,7 @@ "Other useful dictionary functions:\n", "\n", "- **del** (This is a keyword, for removing an element),\n", - "- **update()** (This is a method of adding new elements)\n", + "- **update** (This is a method of adding new elements)\n", "
\n", "
\n", "Let's see some examples:" @@ -622,23 +617,22 @@ "id": "dd897b29-c445-43c5-983b-3fdb59ed8d9a", "metadata": {}, "source": [ - "## SeqRecord\n", + "### The SeqRecord Object in Biopython\n", "\n", - "In Biopython, the SeqRecord object is used to hold a biological sequence along with its associated metadata.\n", + "In Biopython, the SeqRecord object is used to hold a biological sequence along with its associated metadata.\n", "\n", - "If you are familiar with GenBank or EMBL data structures, you can see that the SeqRecord is well-equipped to include the variety of informative data pieces that are used in addition to the simple DNA or Protein sequence alone.\n", + "If you are familiar with GenBank or EMBL data structures, you’ll see that SeqRecord is well-equipped to store the wide range of informative data often included in addition to the DNA or protein sequence itself.\n", "\n", - "A SeqRecord works a bit like a * dictionary * with some built-in keys that all files will have.\n", + "A SeqRecord behaves somewhat like a *dictionary*, with several built-in attributes that are commonly found across different biological file formats.\n", "\n", - "With a SeqRecord, the key attributes you can access directly are:\n", - "" + "With a SeqRecord, the key attributes you can access directly include:\n", + "\n", + "- id: The sequence identifier \n", + "- name: The sequence name \n", + "- description: A descriptive string \n", + "- seq: The biological sequence, stored as a Seq object \n", + "- features: A list of SeqFeature objects, which provide structured information about features like gene locations, protein domains, or other biological annotations \n", + "- dbxrefs: A list of database cross-references \n" ] }, { @@ -656,10 +650,9 @@ "id": "b98e62fd-c160-4cb9-bd75-d1d7188f02ed", "metadata": {}, "source": [ - "# Conclusion\r\n", - "After this Tutorial, you've learned to work with several different data structures that are common to bioinformatic data sets.\n", - "
\n", - "The next tutorial will use tools to analyze and import dataset using [Functions](./Submodule_1_Tutorial4_Functions.ipynb)." + "# Conclusion\r", + "Inr thistTutorial, you've learned to work with several different data structures that are common to bioinformatic data sets.\n", + "
In thee next tutorial will use tools to analyze and import datasest using [Functions](./Submodule_1_Tutorial4_Functions.ipynb)." ] }, { @@ -688,7 +681,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_1/Submodule_1_Tutorial4_Functions.ipynb b/Submodule_1/Submodule_1_Tutorial4_Functions.ipynb index 8cd6437..ae0fcc5 100755 --- a/Submodule_1/Submodule_1_Tutorial4_Functions.ipynb +++ b/Submodule_1/Submodule_1_Tutorial4_Functions.ipynb @@ -16,15 +16,15 @@ "source": [ "## Learning Objectives\n", "In this lesson, you will:\n", - "
\n", "* Learn the structure of Python functions\n", "* Call a function\n", "* Utilize bioinformatics functions from BioPython\n", "\n", "## Prerequisites\n", "\n", - "- Some experience in Python or\n", - "- Tutorials 1, 2, and 3\n", + "- Submodule 1 - Tutorial 1: Python Overview\n", + "- Submodule 1 - Tutorial 2: Variables\n", + "- Submodule 1 - Tutorial 3: Data Structures\n", "\n", "## Getting Started\n", "Run the code box below to import the required libraries" @@ -51,19 +51,19 @@ "source": [ "## Functions\n", "\n", - "In Python, a **function** is a block of reusable code that performs a specific task. It takes input, processes it, and optionally returns a value. It only runs when you \"call\" it \n", + "In Python, a **function** is a block of reusable code that performs a specific task. It takes input, processes it, and optionally returns a value. It only runs when you \"call\" it.\n", "\n", - "Lets try to use a very common bioinformatics task to illustrate the strcture of FUNCTIONS in python. \n", - "
\n", - "They are created with the key word def (you are DEFining the function). The function may be named any way you want and is logical to you.\n", - "
\n", - "Parentheses surround the variables that will be provided by the user of the function. Within the function, many different jobs can be done, including calculations and perhaps returning a value to you, such as you've already seen with the len(string) function that returns to the console the length of the string \"sent\" to it in the parentheses.\n", - "
\n", - "In the Python code box below we define a function, \"Count_base\", which needs two pieces of information: some kind of sequence and the base to be counted in the sequence list. Calling that function will give back (return) a number that represents the count of the base you ask for in the sequence you provide. (count is a built-in function in Python that can be used with strings).\n", + "Let’s try using a common bioinformatics task to illustrate the structure of **functions** in Python.\n", + "\n", + "They are created with the keyword `def` (you are **defining** the function). The function may be named in any way that makes logical sense to you.\n", + "\n", + "Parentheses surround the variables that will be provided by the user of the function. Within the function, many different tasks can be performed, including calculations, and possibly returning a value—such as what you've already seen with the `len(string)` function, which returns the length of the string passed to it.\n", "\n", - "The last line uses the keyword 'return' to tell Python to print to the console the \"result\" of the tasks that it runs.\n", + "In the Python code box below, we define a function called `Count_base`, which needs two pieces of information: a sequence and the base to be counted in that sequence. Calling this function will return a number representing how many times that base appears in the sequence you provide. (`count` is a built-in Python function that works on strings.)\n", "\n", - "
Tip: Try changing the base or making the \"base\" multiple letters (e.g, \"aaa\") and running the Python code box again..
" + "The last line uses the keyword `return` to tell Python to print the result of the function’s operation to the console.\n", + "\n", + "
Tip: Try changing the base or making the \"base\" multiple letters (e.g., \"aaa\") and running the Python code box again.
\n" ] }, { @@ -85,7 +85,7 @@ "id": "dbefcfe0-4bc9-4d9e-bb4d-02e8831e2525", "metadata": {}, "source": [ - "
Tip: Try this: Instead of just returning the number, edit the function so that it returns \"g=17\" or \"In seq, g=17\".
" + "
Tip: Instead of just returning the number, edit the function so that it returns \"g=17\" or \"In seq, g=17\".
" ] }, { @@ -147,7 +147,7 @@ "id": "66a5efbb-8c5d-4fd8-a273-d247d4ad99bc", "metadata": {}, "source": [ - "Functions can also call another function, though to use these routinely you will need to learn to save these. (see ___ in tutorial). For now, lets write another function that calls our count_base function to calculate the GC %" + "Functions can also call another function, though to use these routinely you will need to learn to save these. For now, lets write another function that calls our count_base function to calculate the GC%. " ] }, { @@ -178,7 +178,7 @@ "id": "9a75092f-c6c4-4835-9c1b-2ef56344d809", "metadata": {}, "source": [ - "**Can you write your own tool that calculates the percentage of the time of all guanines that are found as the pairing AG?**" + "Can you write your own tool that calculates the percentage of the time of all guanines that are found as the pairing AG?" ] }, { @@ -194,15 +194,15 @@ "id": "b9065064-ed5d-4090-9312-278c0c40ab82", "metadata": {}, "source": [ - "There are many other ways that we might want to manipulate, align, and evaluate bioinformatic sets (e.g., FASTA sequences, both DNA and protein.) Fortunately, many of these standard functions have already been written and are freely available to everyone in Biopython: \"A set of python tools for computational molecular biology.\" (biopython.org) \n", + "There are many ways we might want to manipulate, align, and evaluate bioinformatic data sets—such as FASTA sequences, both DNA and protein. Fortunately, many standard functions for these tasks have already been written and are freely available through **Biopython**: *“A set of Python tools for computational molecular biology.”* (biopython.org) \n", "
\n", - "We will start here using the tools developed for \"sequence input and output\" (SeqIO). \n", + "We will begin by using tools developed for **sequence input and output** (`SeqIO`). \n", "
\n", - "We import that set of functions and tools (\"objects\") from the whole Biopython toolset with the following syntax: \n", + "We import that specific set of functions and tools (also called \"objects\") from the full Biopython toolkit using the following syntax: \n", "
\n", - "from Bio import SeqIO\n", + "`from Bio import SeqIO` \n", "
\n", - "Here we'll use that to look at a provided file (glut_human.fasta) that contains 4 different protein FASTA sequences, something that would be rather challenging for a novice Python programmer without the BioPython tools." + "We’ll use this to examine a provided file, **`glut_human.fasta`**, which contains four different protein FASTA sequences. Analyzing a file like this manually would be quite challenging for a novice Python programmer—Biopython makes it much easier.\n" ] }, { @@ -226,7 +226,7 @@ "id": "77555b0a-dcfc-448a-b30d-47e644c19c31", "metadata": {}, "source": [ - "You should see the 4 different protein identifiers in the file-- in this case with PDB ID numbers. \n", + "You should see the 4 different protein identifiers in the file- in this case with PDB ID numbers. \n", "\n", "There is a lot of information besides just the ID in each of these records, but it is not convenient to access the pieces yet. But, we can load all of that information into a single variable called (here) record_glut. The specific format is as a python LIST. " ] @@ -288,10 +288,10 @@ "metadata": {}, "source": [ "We wrote a function above (count_base) that can now come in handy to determine how many of any amino acid was present in that sequence. Although we conceived of it as a nucleotide counter, the mini program accepts whatever information we submit to it. \n", - "\n", + "```\n", "def count_base(dna, base):\n", " return dna.count(base)\n", - "\n", + "```\n", "We can send it the FASTA sequence of the GLUT protein and count an amino acid, rather than a base. The function takes any sequence and will count the letter you give it in quotes. This helps us to see how these functions \"think\" about the material you provide to it." ] }, @@ -395,18 +395,21 @@ "id": "9bcdcc97-f4dc-411e-aca7-6ea24ca0c272", "metadata": {}, "source": [ - "Fetching records from NCBI using BioPython\r\n", - "The public databases of bioinformatics data have designed ways to access their extensive files without having to go through the GUI interfaces. We can collect the data to use in our analyses or comparisons in bioinformatics tasks.\r\n", - "\r\n", - "In biopython, the modules for doing so are found in Entrez. We must import those as well as the sequence input/output tools (SeqIO) to read and parse these complex files.\r\n", - "\r\n", - "The commands below will fetch and parse the specified genbank refseq file for the human insulin receptor 2 protein.\r\n", - "\r\n", - "(You can learn more about the different database names and how to use efetch from this book chapter: https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch)\r\n", - "\r\n", - "To run, you'll need to give a proper email address in to Entrez\r\n", - "\r\n", - "The data is read in (by convention) with the name \"handle\" but any variable name can be used. After reading in the information, one always closes the connection = handle.close()" + "Fetching Records from NCBI Using Biopython\n", + "\n", + "The public databases of bioinformatics data have built-in ways to access their extensive files **programmatically**, without needing to use graphical user interfaces (GUIs). This allows us to efficiently collect data for analysis and comparison in bioinformatics tasks. \n", + "
\n", + "In Biopython, the modules for accessing these databases are found in **Entrez**. We must import both `Entrez` (for data fetching) and `SeqIO` (for reading and parsing sequence files). \n", + "
\n", + "The commands below will **fetch and parse** a GenBank RefSeq file for the human *insulin receptor 2* protein. \n", + "
\n", + "🔗 You can learn more about database names and how to use `efetch` from this [NCBI book chapter](https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch). \n", + "
\n", + "**Note:** To run these commands, you must provide a valid email address to `Entrez`. \n", + "
\n", + "The data is conventionally read into a variable named `handle`, but any valid variable name can be used. Once the data is read, **you must close the connection** with: \n", + "`handle.close()`\n", + "ction = handle.close()" ] }, { @@ -436,12 +439,18 @@ "id": "9f35c7af-0412-413b-9d3d-636ce7c9eaec", "metadata": {}, "source": [ - "Reading this genbank file creates a variable of class SeqRecord (i.e, a \"Sequence record\") which is a bit like a \"list\" but it contains an ID, a sequence, and other identifying information. We can find out what \"things\" are in the file by asking for a directory of all the features.\n", + "Reading this GenBank file creates a variable of **class `SeqRecord`** (i.e., a *sequence record*), which behaves somewhat like a list—but also includes useful attributes such as an ID, a sequence, and other identifying information. \n", + "
\n", + "We can explore what components are included in the file by requesting a **directory of all the available attributes** using the `dir()` function.\n", "\n", - "We'll look at just a few parts of the sequence record in this less, so this will show just the attributes we are most likely to need. Look at what is in various aspects of the file by writing:\n", + "In this lesson, we'll focus on just a few key parts of the `SeqRecord`, highlighting the attributes most commonly used in bioinformatics workflows. \n", + "
\n", + "For example, to access the description field of the record, you can write:\n", "\n", + "```python\n", "humInsR2.description\n", - "for example. Any of the terms from the directory can be added after the . but only a few end up being useful to us." + "```\n", + "You can replace .description with any other attribute listed in the output of dir(humInsR2), although only a few are typically useful for common tasks." ] }, { @@ -545,7 +554,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_1/Submodule_1_Tutorial5_Project.ipynb b/Submodule_1/Submodule_1_Tutorial5_Project.ipynb index 94bf222..f35ee6c 100755 --- a/Submodule_1/Submodule_1_Tutorial5_Project.ipynb +++ b/Submodule_1/Submodule_1_Tutorial5_Project.ipynb @@ -7,14 +7,14 @@ "source": [ "# Project for Module 1:\n", "\n", - "Time to put all your new skills into practice! You will download 2 gene sequence files from the NCBI database using biopython, xyz, and align the two genes. \n", + "Time to put all your new skills into practice! You will download 2 gene sequence files from the NCBI database and align the two genes using the packages biopython and xyz.\n", "\n", "## Objectives\n", "\n", "You will compare the human alcohol dehydrogenase 1A gene (NM_000667.4) to a similar gene sequence from the American Mink (Neovison vison, XM_044226065.1), which is the E chain. \n", - "1. Determine the lengths of the two DNA sequences (hint: len(X) gives you the length of any file\n", - "2. Calculate the GC% in each sequence (hint: we wrote a tool called count that you could copy or recreate)\n", - "3. Perform a pairwise global alignment, obtaining the score (hint: tools within pairwise2 imported from bio)\n", + "1. Determine the lengths of the two DNA sequences (hint: len(X) gives you the length of any file.)\n", + "2. Calculate the GC% in each sequence (hint: we wrote a tool called count that you could copy or recreate.)\n", + "3. Perform a pairwise global alignment, obtaining the score (hint: tools within pairwise2 imported from bio.)\n", "\n", "
Tip: If you need help, you can jump to the solutions from the next box.
" ] @@ -33,11 +33,10 @@ "metadata": {}, "source": [ "## Prerequisites\n", - "Before taking on this project, you should either do the 4 tutorials in Submodule 1 \n", - "
\n", - "or\n", - "\n", - "be familiar with Python variables, data structures, and functions, including BioPython tools" + "- Submodule 1 - Tutorial 1: Python Overview\n", + "- Submodule 1 - Tutorial 2: Variables\n", + "- Submodule 1 - Tutorial 3: Data Structures\n", + "- Submodule 1 - Tutorial 4: Functions" ] }, { @@ -74,7 +73,7 @@ "id": "4de267c8-f853-4b86-a557-21bd2b2b6084", "metadata": {}, "source": [ - "Align the two sequences (suggestion: if you'd like to see that it's working, you might want to align and display only a smaller portion of the file, e.g., ...(hum.seq[0:50], mink.seq[100:200])" + "Align the two sequences (suggestion: if you'd like to see that it's working, you might want to align and display only a smaller portion of the file, e.g., `...(hum.seq[0:50], mink.seq[100:200])`" ] }, { @@ -95,7 +94,7 @@ "id": "0c29003d-13db-4f43-8c92-8d3a9655d802", "metadata": {}, "source": [ - "**Congrats on completing your first solo python sequence import and alignment!**" + "Congrats on completing your first solo python sequence import and alignment!" ] }, { @@ -105,12 +104,10 @@ "source": [ "# Conclusion\n", "\n", - "You have completed the first module of the Introduction to Python tutorial!\n", + "You have completed the first module of the Introduction to Python tutorial!\n", "\n", "The next module will help you to develop more advanced data handling skills with NumPy, Pandas, and their powerful data science tools.\n", "\n", - "Start with the Overview page in Submodule_2\n", - "\n", "## Clean up\n", "Be sure to end your compute session to avoid unnecessary charges." ] @@ -132,7 +129,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_1/Submodule_1_Tutorial5_Project_answers.ipynb b/Submodule_1/Submodule_1_Tutorial5_Project_answers.ipynb index 0abb268..8b780b3 100755 --- a/Submodule_1/Submodule_1_Tutorial5_Project_answers.ipynb +++ b/Submodule_1/Submodule_1_Tutorial5_Project_answers.ipynb @@ -21,12 +21,16 @@ "id": "35b54b8d-863b-41f7-9e10-4dc113b45525", "metadata": {}, "source": [ - "Its time to put all your new skills into practice! You will download 2 gene sequence files from the NCBI database using biopython, xyz, and align the two genes. \n", + "Time to put all your new skills into practice! You will download 2 gene sequence files from the NCBI database and align the two genes using the packages biopython and xyz.\n", + "\n", + "## Objectives\n", "\n", "You will compare the human alcohol dehydrogenase 1A gene (NM_000667.4) to a similar gene sequence from the American Mink (Neovison vison, XM_044226065.1), which is the E chain. \n", - "1. Determine the lengths of the two DNA sequences (hint: len(X) gives you the length of any file\n", - "2. Calculate the GC% in each sequence (hint: we wrote a tool called count that you could copy or recreate)\n", - "3. Perform a pairwise global alignment, obtaining the score (hint: tools within pairwise2 imported from bio)" + "1. Determine the lengths of the two DNA sequences (hint: len(X) gives you the length of any file.)\n", + "2. Calculate the GC% in each sequence (hint: we wrote a tool called count that you could copy or recreate.)\n", + "3. Perform a pairwise global alignment, obtaining the score (hint: tools within pairwise2 imported from bio.)\n", + "\n", + "
Tip: If you need help, you can jump to the solutions from the next box.
" ] }, { @@ -112,7 +116,7 @@ "id": "4de267c8-f853-4b86-a557-21bd2b2b6084", "metadata": {}, "source": [ - "Align the two sequences (suggestion: if you'd like to see that it's working, you might want to align and display only a smaller portion of the file, e.g., ...(hum.seq[0:50], mink.seq[100:200])" + "Align the two sequences (suggestion: if you'd like to see that it's working, you might want to align and display only a smaller portion of the file, e.g., `...(hum.seq[0:50], mink.seq[100:200])`" ] }, { @@ -169,7 +173,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_2/Submodule_2_Overview.ipynb b/Submodule_2/Submodule_2_Overview.ipynb index 9d1eaa7..732cdc3 100755 --- a/Submodule_2/Submodule_2_Overview.ipynb +++ b/Submodule_2/Submodule_2_Overview.ipynb @@ -9,7 +9,7 @@ "\n", "Module 2 takes us from introducing fundamental Python characteristics and functions towards data science. \n", "\n", - "The goal is to be able to use Python programming language to perform data analysis, manipulation, and visualization. This will enable you to use Python to extract meaningful insights from large datasets using the Python libraries of NumPy, Pandas, and Matplotlib.\n", + "The goal is to be able to use Python programming language to perform data analysis, manipulation, and visualization. This will enable you to use Python to extract meaningful insights from large datasets using the Python libraries NumPy, Pandas, and Matplotlib.\n", "\n", "The **NumPy** library expands Python's functionality, focusing on arrays of one or more dimensions. These are the foundation for scientific computing and processing of, especially, numerical data. \n", "\n", @@ -69,7 +69,7 @@ "source": [ "## Clean up\n", "\n", - "Remember to shut down your Jupyter Notebook instance when you are done for the day to avoid unnecessary charges you can do this by stopping the notebook instance in the **Cloud console**" + "Remember to shut down your Jupyter Notebook instance when you are done for the day to avoid unnecessary charges. You can do this by stopping the notebook instance in the **Cloud console**" ] } ], @@ -89,7 +89,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_2/Submodule_2_Tutorial_1_NumPy.ipynb b/Submodule_2/Submodule_2_Tutorial_1_NumPy.ipynb index d5acd82..febb209 100755 --- a/Submodule_2/Submodule_2_Tutorial_1_NumPy.ipynb +++ b/Submodule_2/Submodule_2_Tutorial_1_NumPy.ipynb @@ -5,9 +5,7 @@ "id": "e7cccaee", "metadata": {}, "source": [ - "# Tutorial 1: Numerical Python (NumPy)\n", - "\n", - "-----------------------------------------------------------" + "# Tutorial 1: Numerical Python (NumPy)\n" ] }, { @@ -17,12 +15,12 @@ "source": [ "## Overview\n", "\n", - "Are you ready to supercharge your research and data analysis? As a biologist, you're swimming in data—DNA sequences, protein structures, population stats, or experimental results. Processing all that manually or with spreadsheets? That’s the slow lane. Let me introduce you to NumPy, your new powerhouse for scientific computing!\n", + "Are you ready to supercharge your research and data analysis? As a biologist, you're swimming in data— DNA sequences, protein structures, population stats, or experimental results. Processing all that manually or with spreadsheets? That’s the slow lane. Let me introduce you to NumPy, your new powerhouse for scientific computing!\n", "\n", "**Why Should a Biologist Care About NumPy?**\n", "1. Handle Biological Data Like a Pro\n", "\n", - "Whether you’re analyzing gene expression data, creating protein matrices, or modeling population dynamics, NumPy makes it seamless to store, manipulate, and analyze huge datasets—faster and cleaner than Excel or vanilla Python.\n", + "Whether you’re analyzing gene expression data, creating protein matrices, or modeling population dynamics, NumPy makes it seamless to store, manipulate, and analyze huge datasets— faster and cleaner than Excel or vanilla Python.\n", "\n", "2. Speed Up Your Workflow\n", "\n", @@ -45,7 +43,7 @@ "- Dummy-code (one-hot-encode) categorical data into numerical data with a protein sequence\n", "\n", "## Prerequisites\n", - "- basic python\n", + "- Basic python knowledge\n", "\n", "## Getting Started\n", "- Run the next code box to prepare the quiz\n", @@ -73,14 +71,6 @@ "print(\"done installing required packages\")" ] }, - { - "cell_type": "markdown", - "id": "379832e3", - "metadata": {}, - "source": [ - "----------------------------------------------------------------------------------\n" - ] - }, { "cell_type": "markdown", "id": "8dcebcfb", @@ -88,7 +78,7 @@ "source": [ "## About Numpy\n", "Numerical Python​, or \"numpy\" is one of the most popular modules used in Python. Numpy is considered a foundational module for Python high-end scientific computing.​ \n", - "
​\n", + "\n", "The common standard in Python is to import numpy with the alias **np**.​ You will see how often you need to use this alias!\n" ] }, @@ -112,12 +102,10 @@ "metadata": {}, "source": [ "## The Numpy Array \n", - "Among the many components it provides is the **ndarray**.​\n", - "\n", - " - This is a high-performance array or vector and serves as one of the main classes for scientific computing.​\n", - "\n", - " - Only holds one type of element, much like a standard array.​\n", + "Among the many components it provides is the **ndarray**.\n", "\n", + " - This is a high-performance array or vector and serves as one of the main classes for scientific computing.\n", + " - Only holds one type of element, much like a standard array.\n", " - It is created with the **array()** function.\n" ] }, @@ -251,7 +239,7 @@ "\n", "\\[row number, column number]\n", "\n", - "You can also index a whole row or column using a colon, : , in place of the other dimension." + "You can also index a whole row or column using a colon,\":\" in place of the other dimension." ] }, { @@ -310,9 +298,9 @@ "# Test Your Knowledge\n", "(see the solution in the next markdown box)\n", "\n", - "> 1. Create an np array (\"let_arr\") of 10 elements (letters a-j). Note: there is no shortcut for a range of letters\n", - "> 2. Change the 2nd element to 'q' & print to the console\n", - "> 3. Change the 4th element to the string 'cat'. (does it work, what type is it?)" + "1. Create an np array (\"let_arr\") of 10 elements (letters a-j). Note: there is no shortcut for a range of letters.\n", + "2. Change the 2nd element to 'q' & print to the console.\n", + "3. Change the 4th element to the string 'cat'. Does it work, what type is it?" ] }, { @@ -367,8 +355,7 @@ "\n", "By reshaping your NumPy arrays, you adapt your data to fit specific tasks—statistical analysis, visualization, or preparing input for machine learning models. Instead of wasting time manually reorganizing data, you let NumPy do the heavy lifting so you can focus on your biology!\n", "\n", - "Numpy has built-in tools, as we've said, to work with arrays. A single array can be rearranged \n", - "We can reshape an array with **array.reshape()**" + "Numpy has built-in tools, as we've said, to work with arrays. A single array can be rearranged. We can reshape an array with **array.reshape()**" ] }, { @@ -394,10 +381,10 @@ "metadata": {}, "source": [ "## Test Your Knowledge\r", - " 1. Create a 2x5 array of the numbers 1-0\n", - "2. Transpose the array, change rows to columns, columns to rws\n", - "3. . Add the numbers 11 1 \n", - "4. 4. Reshape the array, make it a 2x2x3 array" + "1. Create a 2x5 array of the numbers 1-0.\n", + "2. Transpose the array, change rows to columns, columns to rws.\n", + "3. Add the numbers 11 and 1.\n", + "4. Reshape the array, make it a 2x2x3 array." ] }, { @@ -528,7 +515,8 @@ "metadata": {}, "source": [ "## Test your knowledge\n", - "> -Although you could easily look through the list yourself, use the gene_names and expression_levels arrays again to create a new clean_gene_name array that has removed all genes with zero expression. *This kind of filtering is commonly needed in biological data sets.*" + "\n", + "Although you could easily look through the list yourself, use the gene_names and expression_levels arrays again to create a new clean_gene_name array that has removed all genes with zero expression. *This kind of filtering is commonly needed in biological data sets.*" ] }, { @@ -638,7 +626,7 @@ "\n", "NumPy provides many functions you would expect in highly statistical applications such as data science. We have already been using some of them, such as enumerate and reshape\n", "\n", - "Operations that would take several lines of nested loops in Python can be done in one line with NumPy. Whether you're doing matrix multiplication, statistical analysis, or random sampling, NumPy's tools simplify and accelerate your workflow. We will focus here on ones you might need for bioinformatics. Once you see how these work, you will easily be able to edit and use other functions. The full list is in the [numpy documentation](https://numpy.org/doc/2.1/reference/routines.math.html)\n", + "Operations that would take several lines of nested loops in Python can be done in one line with NumPy. Whether you're doing matrix multiplication, statistical analysis, or random sampling, NumPy's tools simplify and accelerate your workflow. We will focus here on ones you might need for bioinformatics. Once you see how these work, you will easily be able to edit and use other functions. The full list is in the [numpy documentation](https://numpy.org/doc/2.1/reference/routines.math.html).\n", "\n", "- Mathematical functions\n", " * np.sum: Calculates the sum of elements along a specific axis\n", @@ -647,12 +635,11 @@ " * np.min / np.max: Find the minimum and maximum values in an array\n", " * np.std: Calculates the standard deviation\n", " * np.log10: Calculates the log of the value *commonly needed in biolinformatics*\n", - " * \n", "- Logical and Comparison Functions\n", " * np.where: Returns the indices of elements meeting a condition, or replaces elements based on a condition\n", " * np.any / np.all: Checks if any or all elements of an array meet a condition\n", - "- Tools to generate arrays\r\n", - " - np.random.normal\r\n", + "- Tools to generate arrays\r", + " * np.random.normal\r\n", "- For multidimensional arrays\r\n", " * row_means = np.mean(arr, axis=0)\r\n", " * column_means= np.mean(arr, axis =1)\r\n", @@ -729,9 +716,9 @@ "source": [ "
More on Arrays
\n", "\n", - "Arrays can also be split, i.e. taking a single array and breaking it up into multiple sub-arrays​\n", + "Arrays can also be split, i.e. taking a single array and breaking it up into multiple sub-arrays.\n", "\n", - "The following code takes one array and splits it into 3 (equal parts)​" + "The following code takes one array and splits it into 3 (equal parts)." ] }, { @@ -753,9 +740,9 @@ "id": "e22c5629", "metadata": {}, "source": [ - "NumPy allows you to conduct array searches using a **where()** method​\n", + "NumPy allows you to conduct array searches using a **where()** method.\n", "\n", - "The return value is an array of indexes where the search condition was satisfied​" + "The return value is an array of indexes where the search condition was satisfied." ] }, { @@ -777,11 +764,9 @@ "id": "97c395d4", "metadata": {}, "source": [ - "Finally you can sort arrays using the sort() method​\n", - "\n", - " - The return value is a copy of the array sorted​\n", + "Finally you can sort arrays using the sort() method. The return value is a copy of the array sorted.\n", "\n", - "Note the sort is only ascending. To do a descending sort you need to reverse the array using slicing​" + "Note the sort is only ascending. To do a descending sort you need to reverse the array using slicing." ] }, { @@ -803,9 +788,10 @@ "metadata": {}, "source": [ "## Test Your Knowledge\r", - "> - \n", - "Use a version of our previous gene expression array (arr = np.array([\"GeneA\", \"GeneB\", \"GeneC\", \"GeneD\", \"GeneE\", \"GeneF\", \"GeneG\", 5.1, 0.3, 8.7, 1.2, 6.5, 0.0, 2.3]) to perform math function\n", - "> - after you rearrange this into a 5x2 numpy array where the gene names are row 1. (you'll need to cope with the way that numpy interprets this single type array)" + " \n", + "Use a version of our previous gene expression array `arr = np.array([\"GeneA\", \"GeneB\", \"GeneC\", \"GeneD\", \"GeneE\", \"GeneF\", \"GeneG\", 5.1, 0.3, 8.7, 1.2, 6.5, 0.0, 2.3)` to perform math function.\n", + "\n", + "After you rearrange this into a 5x2 numpy array where the gene names are row 1. You'll need to cope with the way that numpy interprets this single type array." ] }, { @@ -921,13 +907,13 @@ "## Test your knowledge\n", "\n", "Can you make a similar tool for dummy coding a protein sequence? You'll use this to answer the quiz questions.\n", - "> - Import a protein sequence from NCBI as your sequence (the quiz questions are based on the FASTA sequence human leptin from the **protein** database, id XP_005250397.1.... To fetch them, you can use the project solution from module 1.)\n", - "> - Create a mapping matrix for the amino acids A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y\n", - "> - Create a one-hot array for the protein sequence.\n", - "> - Print the head or the tail of the one-hot matrix (the first 10 rows) for practice & to confirm\n", - "> - Get the index of lysine (K) from the array you make of the amino acids.\n", - "> - Count number of lysines for the whole sequence & display the value\n", - "> - Check that answer by using the string.count(character) method from module 1" + "* Import a protein sequence from NCBI as your sequence (the quiz questions are based on the FASTA sequence human leptin from the **protein** database, id XP_005250397.1.... To fetch them, you can use the project solution from module 1.)\n", + "* Create a mapping matrix for the amino acids A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y\n", + "* Create a one-hot array for the protein sequence.\n", + "* Print the head or the tail of the one-hot matrix (the first 10 rows) for practice & to confirm\n", + "* Get the index of lysine (K) from the array you make of the amino acids.\n", + "* Count number of lysines for the whole sequence & display the value\n", + "* Check that answer by using the string.count(character) method from module 1" ] }, { @@ -1024,7 +1010,7 @@ "source": [ "# Conclusion\n", "\n", - "In this tutorial, you have learned the many functions associated with the NumPy array. \n", + "In this tutorial, you have learned the many functions associated with the NumPy array.\n", "\n", "The next module introduces you to [Pandas](./Submodule_2_Tutorial_2_Pandas.ipynb)" ] @@ -1037,6 +1023,14 @@ "## Clean up\r\n", "Remember to shut down your Jupyter Notebook instance when you are done for the day to avoid unnecessary charges. You can do this by stopping the noteboko **compute*k instance from the Cloud console." ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0983f205-7499-43b7-b051-c4e996785872", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -1055,7 +1049,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_2/Submodule_2_Tutorial_2_Pandas.ipynb b/Submodule_2/Submodule_2_Tutorial_2_Pandas.ipynb index cfbfeb4..0881bd7 100755 --- a/Submodule_2/Submodule_2_Tutorial_2_Pandas.ipynb +++ b/Submodule_2/Submodule_2_Tutorial_2_Pandas.ipynb @@ -6,7 +6,7 @@ "metadata": {}, "source": [ "# Tutorial 2: Pandas\n", - "-----------------------------------------------\n", + "\n", "## Overview\n", "NumPy is very fast at handling data. However, it is limited by allowing only ONE data type. Pandas adds the familiar data structure like a spreadsheet, with data organized in rows and columns with names. Why not just use Excel or Sheets, then? Because bioinformatics data sets, such as RNAseq results, are so large and you may need to do complex calculations on the data. Python tools are much faster, more powerful and more flexible than pre-packaged spreadsheet tools. Also, the data is typically not printed to the screen in a GUI interface so the processing speed is faster. If you process in the cloud, you can see substantial increases in speed of analysis.\n", "\n", @@ -54,10 +54,10 @@ "\n", "The scripting in Python is good till now, but what about data organization or handling columns of tabular data with different types? ​\n", "\n", - "Pandas focuses on **data management** which can be combined with **analytics tools**​\n", + "Pandas focuses on **data management** which can be combined with **analytics tools**.\n", "\n", "The core data type is a DataFrame. A DataFrame organizes data into rows and columns, making it easy to access, filter, and process.\n", - "![Structure of a Dataframe](./images/pandasDF.png)\n", + "![Structure of a Dataframe](./images/pandasDF.png).\n", "\n", "You can think of it like an Excel spreadsheet or (more appropriately) like a database table​\n", " - Tabular​\n", @@ -76,7 +76,7 @@ "1. Ease of Handling Tabular Data:\n", " - Rows can represent biological samples, sequences, or variants.\n", " - Columns can represent genes, features, or metadata.\n", - " - Unlike Excel, Pandas does not attempt to display all the values all the time, so it is less demanding on computer memory-- especially for the large datasets common to bioinformatics\n", + " - Unlike Excel, Pandas does not attempt to display all the values all the time, so it is less demanding on computer memory- especially for the large datasets common to bioinformatics\n", "\n", "2. Data Analysis: Perform operations like filtering, grouping, and summarizing efficiently.\n", " - Example: Find the top 10 most expressed genes in RNA-Seq data.\n", @@ -97,11 +97,11 @@ "### Creating Data Frames\n", "\n", "A Pandas dataframe can be constructed in many ways. (see [pandas documentation](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe) for more ways). Each column is called a **Series** and has it's own functions in Pandas [see more]\n", - "We can create a dataframe by passing in a dictionary of equal length lists​\n", + "We can create a dataframe by passing in a dictionary of equal length lists.\n", "\n", " * The dictionary keys will be column names​\n", "\n", - "We can also create dataframes from file loads and queries\n", + "We can also create dataframes from file loads and queries.\n", "\n", "Here, you see how it is made with a dictionary as the data that you might assemble yourself. " ] @@ -134,7 +134,7 @@ "\n", "Probably the most common way to create a Pandas dataframe is to **import a CSV** for further analysis. ​\n", "\n", - "You can also import excel, JSON, and the clipboard. [All the data types and import methods](http://pandas.pydata.org/pandas-docs/stable/io.html)\n", + "You can also import excel, JSON, and the clipboard. [All the data types and import methods](http://pandas.pydata.org/pandas-docs/stable/io.html).\n", "\n", "Try the next box to import a portion of a large cancer dataset. " ] @@ -207,11 +207,11 @@ "metadata": {}, "source": [ "### Summary Statistics\n", - "Are the number of rows as expected? What about the column names? General range of continuous variables? It's easy to asses​s (much more so than with a spreadsheet!!)\n", + "Are the number of rows as expected? What about the column names? General range of continuous variables? It's easy to asses​s (much more so than with a spreadsheet!)\n", "\n", - "A Pandas dataframe has a method (df.describe()) that can easily summarize each column. *The summary information is itself a dataframe* To get summary statistics on a single column, just specify that with the name in quotes as shown.\n", + "A Pandas dataframe has a method (df.describe()) that can easily summarize each column. *The summary information is itself a dataframe*. To get summary statistics on a single column, just specify that with the name in quotes as shown.\n", "\n", - "Can you obtain these same statistics for the cancer dataset? Note: these values could be meaningful for normalizing gene expression (later tasks)" + "Can you obtain these same statistics for the cancer dataset? Note: these values could be meaningful for normalizing gene expression(later tasks)." ] }, { @@ -261,8 +261,9 @@ "metadata": {}, "source": [ "As with np arrays, you can reference different dimensions. **df.loc** attribute accesses a group of rows and columns by label(s) or a boolean array in the given DataFrame:\n", - "df.loc [:, range] selects a set of columns\n", - "df.loc[range,:] selects a set of rows​\n", + "\n", + "- df.loc [:, range] selects a set of columns\n", + "- df.loc[range,:] selects a set of rows​\n", "\n", "Thus, you can subset some rows and some columns based on columns. (next, we'll make a subset based on characteristics like size of the population.)\n", "\n", @@ -291,7 +292,7 @@ "source": [ "### Indexes\n", "\n", - "We can select portions of the data frame by indexes (rows and columns) in a variety of ways" + "We can select portions of the data frame by indexes (rows and columns) in a variety of ways." ] }, { @@ -358,8 +359,8 @@ "metadata": {}, "source": [ "### Test your Knowledge\n", - "> 1. Efficiently determine how many states have a life expectancy between 70 and 71\n", - "> 2. What is the average HS Graduate percentage for these states?" + "1. Efficiently determine how many states have a life expectancy between 70 and 71\n", + "2. What is the average HS Graduate percentage for these states?" ] }, { @@ -411,13 +412,7 @@ "source": [ "### Working with Text \n", "\n", - "Strings in Pandas are roughly the same as strings in Python​\n", - "\n", - "We simply operate on series rather than a single object​\n", - "\n", - "Pandas provides many of the same methods, such as **len()**, **lower()**, **upper()**, **split()** and others you have seen before​\n", - "\n", - "Pandas methods also usually ignore missing NaN values​" + "Strings in Pandas are roughly the same as strings in Python. We simply operate on series rather than a single object. Pandas provides many of the same methods, such as **len()**, **lower()**, **upper()**, **split()** and others you have seen before. Pandas methods also usually ignore missing NaN values." ] }, { @@ -536,9 +531,9 @@ "source": [ "### Adding/Deleting columns\n", "\n", - "We can add columns by naming a new column, then assigning a set of values​. If the added column doesn't have enough values, Pandas will automatically add NaN\n", + "We can add columns by naming a new column, then assigning a set of values​. If the added column doesn't have enough values, Pandas will automatically add 'NaN'.\n", "\n", - "Use the **del** keyword to delete a column​\n", + "Use the **del** keyword to delete a column.\n", "\n", "

Try this: You might have noticed that the first two columns of the cancer dataset are basically the same. Drop one of them & check the head to show you were successful.

" ] @@ -563,24 +558,25 @@ "id": "6d6e1af2", "metadata": {}, "source": [ - "### Adding/deleting ROWS\n", - "​\n", + "### Adding/deleting Rows\n", + "\n", "The syntax is slightly more complex to add a row to a Pandas dataframe. \n", "\n", "You can make a new row, then **concat**, that is to concatenate the data. This row should be a single dictionary for one row, or multiple dictionaries to add multiple rows. Since it's a dictionary, you don't have to put the data in the 'correct' order of the existing dataframe. \n", - "\n", + "`\n", "new_row= pd.DataFrame(\\[{key:value, key1:value1, key2:value2}]\n", - "\n", + "`\n", "If you do not give a value for each of the columns, then Pandas will use NaN\n", "\n", "Then, add that row of new data by concatenation. \n", + "`\n", "df = pd.concat(\\[df, new_rows], ignore_index=True)\n", - "\n", + "`\n", "**To delete**, and to directly replace the dataframe using the inplace tag:\n", - "\n", + "`\n", "df.drop(index=\\[Rowindex1, Rowindex2], inplace=True)\n", - "\n", - "If you just drop one row, use index=#" + "`\n", + "If you just drop one row, use index=#. " ] }, { @@ -627,7 +623,9 @@ "\n", "You can use that to your advantage (remembering that US VI will always be index 51) or we can reindex:\n", "\n", - "**df.reset_index(drop=True, inplace=True)**\n" + "`\n", + "df.reset_index(drop=True, inplace=True)\n", + "`" ] }, { @@ -684,7 +682,6 @@ "id": "aac18a37", "metadata": {}, "source": [ - "-----------------------------------------------------------------------------------\n", "## Data wrangling\n", "\n", "A large challenge when working with real biological datasets is data \"cleaning.\" This removes values or samples with errors (negative read counts, participants over age 110) or missing data (NaN) that could skew downstream analysis. This process must be carried out systematically and carefully. The normal workflow for this is beyond this introductory module. Please look at the NIGMS Sandbox Machine Learning Module for a full treatment on this topic.\n", @@ -1202,17 +1199,15 @@ "id": "b9892700", "metadata": {}, "source": [ - "We often want to apply functions to groups within our dataset. ​\n", + "We often want to apply functions to groups within our dataset. \n", "\n", - " - What is the mean value within group?​\n", + " - What is the mean value within group?\n", + " - What is the variance within group?\n", + " - Perform a linear regression within group, then get the slope estimates out.\n", "\n", - " - What is the variance within group?​\n", + "The first step is to establish the groups (SPLIT). \n", "\n", - " - Perform a linear regression within group, then get the slope estimates out.​\n", - "\n", - "The first step is to establish the groups (SPLIT)​\n", - "\n", - "In Pandas, we can use the **.groupby()** method for this Use states, and group on the variable we just created ​\n", + "In Pandas, we can use the **.groupby()** method for this Use states, and group on the variable we just created \n", "\n", "Note: \"Groupby\" makes a new groupby pandas object. The **agg** function in a Pandas groupby object is used to apply one or more aggregation functions to grouped data. It allows you to compute summary statistics like mean, sum, min, max, and more, across multiple columns or for specific groups." ] @@ -1239,7 +1234,7 @@ "id": "1a1428dc", "metadata": {}, "source": [ - "We can index the group-by with a dynamically created value" + "We can index the group-by with a dynamically created value." ] }, { @@ -1260,11 +1255,9 @@ "source": [ "# Pandas Visuals \n", "\n", - "The next tutorial is all about python visuals (mainly using matplotlib), but because it is so important to data analytics, pandas provides visuals too​\n", - "\n", - "Pandas supports many types of plots​\n", + "The next tutorial is all about python visuals (mainly using matplotlib), but because it is so important to data analytics, pandas provides visuals too.\n", "\n", - " - Line, bar, area, box, histogram and scatter plots among others" + "Pandas supports many types of plots- line, bar, area, box, histogram and scatter plots among others. " ] }, { @@ -1360,9 +1353,9 @@ "metadata": {}, "source": [ "# Conclusion\n", - "You now have tools to do a LOT of manipulations of data frames with Pandas in Python. You are ready to work through a bioinformatics exercise. [Exploring ligand binding sites](./Submodule_2_Tutorial_2b_PDB_Pandas_Exercise.ipynb) in a pdb file (protein structure file) using pandas and biopandas\n", + "You now have tools to do a LOT of manipulations of data frames with Pandas in Python. You are ready to work through a bioinformatics exercise. [Exploring ligand binding sites](./Submodule_2_Tutorial_2b_PDB_Pandas_Exercise.ipynb) in a pdb file (protein structure file) using pandas and biopandas.\n", "\n", - "If you do not want to practice at this point, the next tutorial picks up with how to [visualize your data](./Submodule_2_Tutorial_3_VisualizingData.ipynb) with graphs in matplotlib\n" + "If you do not want to practice at this point, the next tutorial picks up with how to [visualize your data](./Submodule_2_Tutorial_3_VisualizingData.ipynb) with graphs in matplotlib.\n" ] }, { @@ -1371,7 +1364,7 @@ "metadata": {}, "source": [ "## Clean up\n", - "Remember to stop your Jupyter Notebook compute instance to avoid unnecessary charges.." + "Remember to stop your Jupyter Notebook compute instance to avoid unnecessary charges." ] } ], @@ -1391,7 +1384,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_2/Submodule_2_Tutorial_2b_PDB_Pandas_Exercise.ipynb b/Submodule_2/Submodule_2_Tutorial_2b_PDB_Pandas_Exercise.ipynb index e4ac61a..a736f2c 100755 --- a/Submodule_2/Submodule_2_Tutorial_2b_PDB_Pandas_Exercise.ipynb +++ b/Submodule_2/Submodule_2_Tutorial_2b_PDB_Pandas_Exercise.ipynb @@ -6,7 +6,7 @@ "metadata": {}, "source": [ "# Biopandas for PDB files\n", - "-----------------------------------------------------------------------------\n", + "\n", "## Overview\n", "If you are a structural biologist working with molecular structure files, a fantastic way to process pdb files is with Pandas dataframes. The tools are beyond the scope of this Introduction to Python course, but we include it here to give you a taste of how you can use traditional programming to query and calculate with these complex file types.\n", "\n", @@ -156,7 +156,7 @@ "### Step one: Get the file\n", "Assign the pdb file to the variable pdb PDB ID: 4AKE (adenylate kinase with a bound ligand). \n", "\n", - "As always, look at the first few lines of the file to make sure it is what you expected and to see the column names" + "As always, look at the first few lines of the file to make sure it is what you expected and to see the column names." ] }, { @@ -357,7 +357,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_2/Submodule_2_Tutorial_3_VisualizingData.ipynb b/Submodule_2/Submodule_2_Tutorial_3_VisualizingData.ipynb index 6e0866e..84e04ee 100755 --- a/Submodule_2/Submodule_2_Tutorial_3_VisualizingData.ipynb +++ b/Submodule_2/Submodule_2_Tutorial_3_VisualizingData.ipynb @@ -5,8 +5,7 @@ "id": "7c73d63b-45c6-4e52-bc17-d40186fc602b", "metadata": {}, "source": [ - "# Tutorial 3: Visualizing Data\n", - "------------------------------------------------------------------------------------" + "# Tutorial 3: Visualizing Data\n" ] }, { @@ -50,7 +49,7 @@ "metadata": {}, "source": [ "# Matplotlib\n", - "The main plotting set of libraries in Python is **Matplotlib** . It can do 2D and 3D plotting in \"Matlab\" style, and even [animate plots](https://matplotlib.org/stable/users/explain/animations/animations.html#sphx-glr-users-explain-animations-animations-py) (Although that is beyond the scope of this introduction). \n", + "The main plotting set of libraries in Python is **Matplotlib** . It can do 2D and 3D plotting in \"Matlab\" style, and even [animate plots](https://matplotlib.org/stable/users/explain/animations/animations.html#sphx-glr-users-explain-animations-animations-py) (although that is beyond the scope of this introduction). \n", "\n", "It is typical to import the libraries (matplotlib.pyplot) as plt to simplify it's use, much like we have been using np and pd as shorthand for NumPy and Pandas in this module. \n", "\n", @@ -59,10 +58,10 @@ "While it may seem easier to use Excel and its drop-down menus to make plots, there are some fantastic advantages with Matplotlib.\n", "1. You can print in a format with enough dots per inch (dpi) to make publication-qualitity images: fig.saveas('fig.tiff',dpi=600)\n", "2. While it takes a few more steps than excel:\n", - " - you CAN control all the characteristics\n", - " - the code is re-usable and can be applied to the next plot vs. having to start from scratch for each excel or even SigmaPlot figure.\n", - " - python is free\n", - " - you can plot much larger datasets\n" + " - you CAN control all the characteristics.\n", + " - the code is re-usable and can be applied to the next plot vs. having to start from scratch for each excel or even SigmaPlot figure.\n", + " - python is free.\n", + " - you can plot much larger datasets.\n" ] }, { @@ -72,9 +71,7 @@ "source": [ "## Making a plot with PyPlot\n", "\n", - "With Pyplot, we establish a figure, then annotate it and/or add all the necessary plotting elements. By default, the plot() function draws a line from point to point. \n", - "\n", - " 'Dose µM0, 0.1, 0.5, 1, 5, 10, 50, 100, 200, 5000 'Cell Viabilit drug 1y ()': [100, 98, 92, 85, 70, 60, 40, 25, 5Cell Viability with drug 2: 100, 99, 97, 94, 90, 85, 80, 75, 70, 65print(df)\r\n" + "With Pyplot, we establish a figure, then annotate it and/or add all the necessary plotting elements. By default, the plot() function draws a line from point to point. \r\n" ] }, { @@ -108,11 +105,11 @@ "\n", "\"Anatomy\n", "\n", - "At the top of the hierarchy is the **Figure** object, holding one or more **Axes**\n", + "At the top of the hierarchy is the **Figure** object, holding one or more **Axes**.\n", "\n", - "Below that are individual lines, grids, legends and text boxes, ticks and labels\n", + "Below that are individual lines, grids, legends and text boxes, ticks and labels.\n", "\n", - "This gives us a fine granularity and level of control over the plot\n" + "This gives us a fine granularity and level of control over the plot.\n" ] }, { @@ -196,25 +193,35 @@ "source": [ "## Histograms\n", "\n", - "A histogram is a type of bar plot where the X-axis represents the bin ranges while the Y-axis gives information about frequency\n", + "A **histogram** is a type of bar plot where the X-axis represents bin ranges, and the Y-axis shows the frequency of data points within those ranges.\n", + "\n", + "Histograms are appropriate for **continuous numerical data**. They provide insight into the **distribution** (or *density*) of data points.\n", + "\n", + "### Key Components of a Histogram:\n", + "\n", + "1. **Bins**: The range of values is divided into intervals called *bins*. The height of each bin represents the number of data points that fall within that interval.\n", + "2. **X-axis**: Represents the values or ranges of the data being plotted.\n", + "3. **Y-axis**: Represents the **frequency** (or count) of data points within each bin.\n", + "\n", + "---\n", + "\n", + "### Generating Histograms with NumPy & Matplotlib\n", "\n", - "Data that is appropriate for a histogram is continuous. Histograms show us the overall distribution of numerical data, so you are displaying the *density* of data points. \n", + "Histograms are typically created from arrays of data. In this first example, we’ll use **NumPy** to generate random distributions and **Matplotlib** to plot them.\n", "\n", - "The key variables of histograms that affect their appearance are \n", - "1. Bins: The range of values is divided into a set of intervals called bins. The height of each bin represents the frequency of data points falling within that interval\n", - "2. \n", - "X-axis: Represents the values or ranges of the data being plotetd.\n", - "3.\r\n", - "Y-axis: Represents the frequency or count of data points within eachin \n", + "Matplotlib will automatically choose the number of bins, but you **can** specify the number or size of bins manually. Try adjusting the number of bins to see how it affects the shape of the histogram. You can also explore different data distributions, such as exponential vs. normal.\n", "\n", + "---\n", "\n", - "The histogram plots an array of data. In this first demonstration, we will just create rayars os random distributions me bady NumPy\n", + "### Customizing Histogram Appearance\n", "\n", - "Matplotlib will automatically create bins, but you CAN choose the size. Play around with the # of bins and look at the alternative (exponential) array\n", + "You have many options to control how your histogram looks—bin count, bin width, color, edge styling, transparency, and more.\n", "\n", - "You have a lot of options for how your histogram could look. Here are some examples: \n", - "![HistogramTypes./images/](mpl_hist.bmp)\n", + "---\n", "\n", + "### Example Histogram Types:\n", + "\n", + "![HistogramTypes](./images/mpl_hist.bmp)\n", "\n" ] }, @@ -303,8 +310,9 @@ "source": [ "### Histograms in Pandas\n", "The central wrapper is DataFrame.plot() The default value is line plots You can change this with the kind argument: ‘bar’, ‘scatter’, ‘pie’ and others Thus, you can call for a histogram directly (the other plots require X and Y)\n", - "\n", - "df.plot.typeofplot\n" + "```\n", + "df.plot.typeofplot\n", + "```" ] }, { @@ -342,7 +350,8 @@ "source": [ "### Histogram Styles\n", "We can add arguments to plt.hist() for different styles.\n", - "Check help for plt.hist() " + "\n", + "Check help for plt.hist()." ] }, { @@ -392,10 +401,10 @@ "id": "6063ce47-bfba-46e3-9e5b-0db905d0532c", "metadata": {}, "source": [ - "The central wrapper is DataFrame.plot()\n", - "The default value is line plots\n", - "You can change this with the kind argument: ‘bar’, ‘scatter’, ‘pie’ and others\n", - "You can also call hist directly " + "The central wrapper is DataFrame.plot().\n", + "The default value is line plots.\n", + "You can change this with the kind argument: ‘bar’, ‘scatter’, ‘pie’ and others.\n", + "You can also call hist directly ." ] }, { @@ -422,6 +431,7 @@ "metadata": {}, "source": [ "Pandas - Annotate Title\n", + "\n", "Set the blank canvas in order to annotate it." ] }, @@ -445,8 +455,7 @@ "2. Optional Enhancements:\n", " * Additional arguments for customization, such as color (c), size (s), labels, etc.\n", "\n", - "**If you use pandas** you use df.plot(x='Column', y='Column, kind='scatter')\n", - "Pandas uses the column labels for X & Y axis.\n", + "**If you use pandas** you use df.plot(x='Column', y='Column, kind='scatter'). Pandas uses the column labels for X & Y axis.\n", "\n", "See this use for our states data." ] @@ -552,14 +561,12 @@ "\n", "There are two main ways to create subplots\n", "1. plt.subplot() (Simple Grid-Based Subplots):\n", - "\n", - " - Specify a grid layout (e.g., 2x2) and plot in a specific cell.\n", - " - Example: plt.subplot(2, 2, 1) means a 2x2 grid, and the plot is in the first cell.\n", + " - Specify a grid layout (e.g., 2x2) and plot in a specific cell.\n", + " - Example: plt.subplot(2, 2, 1) means a 2x2 grid, and the plot is in the first cell.\n", "\n", "2. plt.subplots() (More Flexible and Modern):\n", - "\n", - " - Creates a grid layout and returns figure and axes objects for better control.\n", - " - Use axes[i, j] to reference individual subplots." + " - Creates a grid layout and returns figure and axes objects for better control.\n", + " - Use axes[i, j] to reference individual subplots." ] }, { @@ -600,7 +607,7 @@ "\n", "The shape and location of the box and whiskers in a boxplot provide insights into the distribution of the data:\n", "\n", - "1. **Box (Middle 50% of Data):**The box represents the interquartile range (IQR), which is the range between the 25th percentile (Q1) and the 75th percentile (Q3). A wide box indicates high variability in the middle 50% of the data, while a narrow box suggests low variability.\n", + "1. **Box (Middle 50% of Data):** The box represents the interquartile range (IQR), which is the range between the 25th percentile (Q1) and the 75th percentile (Q3). A wide box indicates high variability in the middle 50% of the data, while a narrow box suggests low variability.\n", "2. **Line Inside the Box (Median):** The line inside the box shows the median (50th percentile), giving the central tendency of the data.\n", "If the median is closer to one end of the box, it indicates a skewed distribution.\n", "3. **Whiskers:** The whiskers extend from the box to the smallest and largest data points within 1.5 times the IQR.\n", @@ -717,7 +724,7 @@ "metadata": {}, "source": [ "## Clean up\n", - "Remember to shut down your Jupyter Notebook compute instance when you are done for the day to avoid unnecessary charges. " + "Remember to shut down your Jupyter Notebook compute instance when you are done to avoid unnecessary charges. " ] }, { @@ -792,7 +799,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_2/Submodule_2_Tutorial_4_InferentialStatistics.ipynb b/Submodule_2/Submodule_2_Tutorial_4_InferentialStatistics.ipynb index 3ec8342..7570bb5 100755 --- a/Submodule_2/Submodule_2_Tutorial_4_InferentialStatistics.ipynb +++ b/Submodule_2/Submodule_2_Tutorial_4_InferentialStatistics.ipynb @@ -5,8 +5,7 @@ "id": "ed5c8c48-8bae-4d86-ae10-8b83bc60b812", "metadata": {}, "source": [ - "# Tutorial 4: Introduction to Inferential Statistics in Python\n", - "------------------------------------------------------------------------------------------------------------------" + "# Tutorial 4: Introduction to Inferential Statistics in Python\n" ] }, { @@ -1801,7 +1800,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_2/Submodule_2_Tutorial_5_PandasGuidedExercise.ipynb b/Submodule_2/Submodule_2_Tutorial_5_PandasGuidedExercise.ipynb index 1c8dd0b..494649c 100755 --- a/Submodule_2/Submodule_2_Tutorial_5_PandasGuidedExercise.ipynb +++ b/Submodule_2/Submodule_2_Tutorial_5_PandasGuidedExercise.ipynb @@ -6,7 +6,7 @@ "metadata": {}, "source": [ "# Pandas RNA-seq exercise\n", - "--------------------------------------------------------------------------------\n", + "\n", "## Overview\n", "\n", "## Learning Objectives\n", @@ -47,9 +47,9 @@ "metadata": {}, "source": [ "## Pandas Exercise: Analyzing RNA-seq Data\n", - "Scenario: You have obtained RNA-seq data from NCBI,and you need to analyze it using Pandas. The dataset contains information about gene expression levels across different samples. Your tasks involve data cleaning, normalization, and exploratory analysis.\n", + "Scenario: You have obtained RNA-seq data from NCBI,and you need to analyze it using Pandas. The dataset contains information about gene expression levels across different samples. Your tasks involve data cleaning, normalization, and exploratory analysis.\n", "\n", - "*you should be able to use other datasets, including your own!*\n", + "*You should be able to use other datasets, including your own!*\n", "\n", "**Recommended Dataset:**\n", "GEO Series Accession: GSE198050\n", @@ -73,9 +73,7 @@ "\n", "Some later steps will use techniques we have not yet covered in the tutorials.\n", "\n", - "If you get stuck, all the required coding can be found at the end. BUT, you should focus on **trying it yourself.**\n", - "\n", - "-----------------------------------------------------------------------------------------------" + "If you get stuck, all the required coding can be found at the end. BUT, you should focus on **trying it yourself.**\n" ] }, { @@ -183,7 +181,7 @@ "1. Create a variable to hold the sum of each column (df.sum())\n", "2. Divide the data (axis=1) by these count values & multiply by 1 million\n", "\n", - "The only challenge is that the first column (or 2?) is gene names. Remove the gene name columns, merging them back onto the dataframe after doing the math " + "The only challenge is that the first column (or 2?) is gene names. Remove the gene name columns, merging them back onto the dataframe after doing the math. " ] }, { @@ -267,8 +265,6 @@ "id": "59c4e3a9-a9ca-4b8b-851e-49a89a13be2b", "metadata": {}, "source": [ - "---------------------------------------------------------------------------------------\n", - "\n", "## Advanced steps\n", "The rest of the steps are provided below. You will need to adjust your variable names (or the ones below) to enable you to demonstrate the remaining steps!" ] @@ -516,7 +512,7 @@ "source": [ "## Conclusion\n", "\n", - "This exercise demonstrated some of the power of Python libraries (NumPy, Pandas, matplotlib, & statsmodel) to perform complex bioinformatics tasks and protein visualizations. After this guided exercise, it is time to tackle the [submodule 2 bioinformatics project](./Submodule_2_Tutorial_6_Project.ipynb) using the many skills you've learned in this module! (or jump to the [solved version](./Submodule_2_Tutorial_7_ProjectSolutions.ipynb) of the project)\n", + "This exercise demonstrated some of the power of Python libraries (NumPy, Pandas, matplotlib, & statsmodel) to perform complex bioinformatics tasks and protein visualizations. After this guided exercise, it is time to tackle the [submodule 2 bioinformatics project](./Submodule_2_Tutorial_6_Project.ipynb) using the many skills you've learned in this module (or jump to the [solved version](./Submodule_2_Tutorial_7_ProjectSolutions.ipynb) of the project)!\n", "\n", "### Clean up\n", "\n", @@ -540,7 +536,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_2/Submodule_2_Tutorial_6_Project.ipynb b/Submodule_2/Submodule_2_Tutorial_6_Project.ipynb index b7a8a90..7a7064c 100755 --- a/Submodule_2/Submodule_2_Tutorial_6_Project.ipynb +++ b/Submodule_2/Submodule_2_Tutorial_6_Project.ipynb @@ -5,8 +5,7 @@ "id": "3c16d197-b287-4bf2-af0c-d1712bc0f655", "metadata": {}, "source": [ - "# Submodule 2 Project: Predictors of Diabetes\n", - "------------------------------------------------------------" + "# Submodule 2 Project: Predictors of Diabetes" ] }, { @@ -28,7 +27,7 @@ "\n", "## Prerequisites\n", "\n", - "You should have completed all the tutorials in Module 2 and developed some level of comfort with the tools\n", + "You should have completed all the tutorials in Module 2 and developed some level of comfort with the tools.\n", "\n", "## Getting Started\n", "The 5 tasks are described below. Solutions for each part are provided in a [solved version](./Submodule_2_Tutorial6_ProjectSolutions.ipynb) of this notebook" @@ -210,7 +209,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_3/Submodule_3_Overview.ipynb b/Submodule_3/Submodule_3_Overview.ipynb index 7b2e8cd..3fab620 100755 --- a/Submodule_3/Submodule_3_Overview.ipynb +++ b/Submodule_3/Submodule_3_Overview.ipynb @@ -5,8 +5,7 @@ "id": "71797b6e-6170-4f01-9578-0e16390e75c8", "metadata": {}, "source": [ - "# Submodule 3 Overview: Python and Object-oriented programming\n", - "---------------------------------------------------------------------------------" + "# Submodule 3 Overview: Python and Object-oriented programming" ] }, { @@ -57,7 +56,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_3/Submodule_3_Tutorial_1_OOP.ipynb b/Submodule_3/Submodule_3_Tutorial_1_OOP.ipynb index c3d6458..d38f30b 100755 --- a/Submodule_3/Submodule_3_Tutorial_1_OOP.ipynb +++ b/Submodule_3/Submodule_3_Tutorial_1_OOP.ipynb @@ -5,8 +5,7 @@ "id": "81a299a1", "metadata": {}, "source": [ - "# Submodule 3 Tutorial 1: Object-Oriented Programming in Python\n", - "---------------------------------------------------------------" + "# Submodule 3 Tutorial 1: Object-Oriented Programming in Python" ] }, { @@ -18,9 +17,9 @@ "Objects in Python (and in Object-Oriented Programming) allow for modular, efficient, and scalable code by organizing data and functions into reusable units. This is especially useful in bioinformatics, where handling large datasets and complex analyses efficiently is critical.\n", "\n", "## Learning Outcomes\n", - "*AFter this tutorial, you should be able to:*\n", - "- Define class, object, methods, attributes, inheritance\n", - "- Write or edit these elements of a Python class " + "* After this tutorial, you should be able to:*\n", + " - Define class, object, methods, attributes, inheritance\n", + " - Write or edit these elements of a Python class " ] }, { @@ -262,7 +261,7 @@ "id": "4dc69382-963c-4c1f-b55f-5c741989bed2", "metadata": {}, "source": [ - "---------------------------------------------------------------------------------\n", + "\n", "
After the overview, now we delve deeper into the individual pieces.
" ] }, @@ -387,12 +386,12 @@ "metadata": {}, "source": [ "
Tip: Try these in the above mutation class code block:\n", - "
\n", + "\n", " - make a second mutation object (mut2) \n", " - querry other pieces of mut1 and mut2 such as the position.\n", " - see what happens if you do NOT give all expected/required pieces of information. \n", " - add a new element of the variable (\"disease_associated\")
\n", - "
\n", + "\n", "*You should be able to tell that Python is using known classes to deal with the values entered, such as integers or strings*\n" ] }, @@ -640,15 +639,16 @@ "source": [ "## Conclusion\n", "\n", - "In this tutorial, you have learned:\n", - "- Some OOP vocabulary\n", - "- Defined a new class\n", - "- Written your own object\n", + "In this tutorial, you have:\n", + "- learned some OOP vocabulary.\n", + "- defined a new class.\n", + "- written your own object.\n", "\n", "You may be ready to try some advanced techniques in OOP in [Tutorial 2: OOP2](Submodule_3_Tutorial_2_OOP2.ipynb)\n", - "
\n", + "\n", "OR\n", - "Work with [modules and packages](Submodule_3_Tutorial_3_Modules&Packages.ipynb)" + "\n", + "Work with [modules and packages](Submodule_3_Tutorial_3_Modules&Packages.ipynb)." ] }, { @@ -677,7 +677,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_3/Submodule_3_Tutorial_2_OOP2.ipynb b/Submodule_3/Submodule_3_Tutorial_2_OOP2.ipynb index 7f02f18..68e5ebd 100755 --- a/Submodule_3/Submodule_3_Tutorial_2_OOP2.ipynb +++ b/Submodule_3/Submodule_3_Tutorial_2_OOP2.ipynb @@ -433,11 +433,11 @@ "We will examine two types of **error checking**: in the constructor and in a decorator.\n", "\n", "Error checking in the constructor (__init__) ensures that invalid data is caught immediately when an object is created. \n", - "
\n", + "\n", "Let's examine how to validate that the diagnosis is from an allowed list.\n", - "
\n", - "You can see that **before** we store the provided diagnosis value in self.diagnosis, we first eliminate issues with capitalization (the interpretation of strings is VERY literal, as you remember, such that diabetes is not Diabetes)\n", - "
\n", + "\n", + "You can see that **before** we store the provided diagnosis value in self.diagnosis, we first eliminate issues with capitalization (the interpretation of strings is VERY literal, as you remember, such that diabetes is not Diabetes).\n", + "\n", "We also check that the provided diagnosis is in our short list of options. **If** it is, then we can store it as self._diagnosis\n" ] }, @@ -949,7 +949,7 @@ "id": "b58d4914", "metadata": {}, "source": [ - "Because functions are an object like any other variable, Python allows us to pass them just as we would any other varaible.\n", + "Because functions are an object like any other variable, Python allows us to pass them just as we would any other variable.\n", "- We have already seen functions used as paramters in closures, but it is even simpler than that. You can just create functions and pass them to any other function:" ] }, @@ -1058,7 +1058,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.16" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/Submodule_3/Submodule_3_Tutorial_3_Project.ipynb b/Submodule_3/Submodule_3_Tutorial_3_Project.ipynb index 34661f2..13212e4 100755 --- a/Submodule_3/Submodule_3_Tutorial_3_Project.ipynb +++ b/Submodule_3/Submodule_3_Tutorial_3_Project.ipynb @@ -18,15 +18,14 @@ "- Build scripts that load, filter, and summarize real-world datasets.\n", "\n", "## Prerequisites\n", - "- Submodule 1\n", - "- Submodule 2 (especially Pandas so you can import the dataset\n", + "- Submodule 1 Tutorials\n", + "- Submodule 2 Tutorials (especially Pandas so you can import the dataset)\n", "- Submodule 3 Tutorials\n", "\n", "## Getting Started\n", "\n", "Below, you will find a task prompt that will require you to define a new Class which can handle a dataset. You can attempt the task on your own, or use the guided prompts which are each followed by a \"fill-in-the-blank\" model. You can copy each of those sections, edit them, and use each to build a class then write a script. If you get stuck, the entire solution is in the next tutorial.\n", - "\n", - "-----------------------------------------" + "\n" ] }, { @@ -328,7 +327,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.4" + "version": "3.12.9" } }, "nbformat": 4,