Machine Learning X
A free, open-source Machine Learning & Data Science curriculum that’s currently under heavy development. This is a project by intechgration.
Make sure to install the following software on your machine:
Here are some of the most commonly used data formats. They are used to save information in a way that is readable both by humans and machines. The structured data stored in these file formats can be exchanged between systems and processed by programs.
CSV (Comma-Separated Values) is a simple file format used for storing and exchanging structured data, where each line represents a record or entry, and fields or columns within each record are separated by commas.
Understanding CSV Files (⏱️ 6min)
sample.csv
file mentioned in the video here.In short, CSV is a lightweight data format, where:
,
delimiter characterAll spreadsheet apps (MS Excel, Google Sheets, Numbers, etc.) can read and write CSV
Watch The Basics of YAML in Under 5 Minutes (⏱️ 4min)
Feel free to explore and become familiar with other related data formats and their variations, such as TSV (Tab Separated Values), TOML (Tom’s Obvious, Minimal Language) and other.
Resources:
SQL is the most widely used language in Data Science.
Watch and practice with Spreadsheets & SQL for Beginners (⏱️ 21min)
Take the Database Murder Mystery Challenge
Watch the Harvard CS50 Introduction to Databases with SQL course (⏱️ 11h):
“Python, is the primary high-level language for Machine Learning and Data Science.”
You can practice Python online (without the need to install anything on your machine) through the PythonFiddle website
Extra Resources:
Pick one of the following courses for a primary on Statistics (or watch both of them for an even better understanding of the topic):