!hands-On Data Scientist

מסלול Data Scientist פרקטי לשילוב בשוק העבודה ובחברת G-STAT

מרצים מומחים בעלי ניסיון בהובלת אסף אלקן ואמיר לורך 

קורס חדשני המותאם לשוק העבודה – הרצאות פרונטליות לצד מפגשי אונליין


שעות אקדמאיות

מתוכם כ-160 שעות תרגול Hands-On המותאם לשוק העבודה!

למה ללמוד יחד עם

יותר תכלס

מסלול הכשרה אינטנסיבי ומעשי. עשרות Datasets, מאות תרגילים

פיתוח חשיבה אנליטית

דגש על פיתוח חשיבה ביקורתית, וגישה לבעיות עסקיות. לא רק כלים טכניים

כלים לשוק העבודה

לומדים את ה-Best Practice המקנה מענה לבעיות העסקיות הנפוצות בשוק

ליווי אישי

זמן מתרגל שבועי, אחד על אחד, עבור כל שאלה שעולה לכם

ניתוח נתונים זה היומיום שלנו

G-Stat היא חברה המונה כ-230 אנליסטים ו Data Scientists

הכי מחוברים לשטח

מרצי הקורס מגיעים מהתעשייה, חיים את עולם מדעי הנתונים, והכי חשוב - יודעים ואוהבים ללמד

חומרי ההדרכה שלנו

אשר כוללים - הקלטות, דוגמאות, תרגילים והסברים. ילוו אתכם הרבה אחרי שהקורס יסתיים

בואו לעבוד איתנו

בהתאם לדרישות השוק - בוגרי הקורס המצטיינים יקלטו לעבודה ב G-Stat

מחזורים הבאים


קורס ערב - ההרשמה פתוחה!

תכנית לימודים

קורס Data Scientist

שעות לימוד פרונטליות: 160
שעות לימוד פרטני ע"י מתרגל: כ-60
שעות תרגול, למידה עצמית

ועבודה על פרויקט גמר: כ-100


Intro to DS & statistics


eda & visualization


supervized learning


unsupervized learning


deep learning


cloud computing


The first module presents a general description of the data scientist position, common business use-cases and how to solve them using Machine learning and Data science capabilities. 

We will provide a detailed description of the course outline and describe the basic libraries which we will use during the course. 

 In this module:

      Business use-cases and solutions

       What is data science, and what does a data scientist do?

       The data science pipeline

Detailed course outline


The second module is dedicated to the statistical knowledge needed from a Data-Scientist.

In order to process and define key features from Big-Data, there is a need to test and evaluate the full extent of the data, using statistical methods.

In this module:

        Mean, median, standard deviation

        Random selection VS population

        Data Distribution – Focusing on Gaussian

        Statistical testsZ, T,chi

        Type one and type two errors


       Variables type

In this module we will focus on exploring the data, his behavior and shape, using a variation of graphs and visualizations. This process is the main part of the day-to-day work of a data scientist. We will present the science behind analyzing big amount of data and how to enable generalization capabilities on a population.

In this module:

        Data vitalizations

        Outliers detection and treatment

        Correlation and collinearity


      Graphing using MatplotLib and seaborn

In this module, we will examine the steps and ways to select features for the future machine learning model. This section is devoted to data manipulation and selection.
We will present a few of the most effective strategies for organizing data and selecting the most appropriate features. This will ensure that the model can be taught in the most comprehensive manner.
In this module:
• Data cleaning
• Dealing with missing values
• Transformations and aggregations
• One-dimensional analysis
• Transformations in the time dimension

In this module, we will explore the world of machine learning and the various existing models. We will understand which tasks can be solved using ML, what are the appropriate models for each task and what are their basic hypotheses.

In this module:

  • Machine learning models
  • Supervised vs. unsupervised
  • Scikit-learn

Supervised learning is a branch within machine learning where we train the model based on labeled information, "solved examples". The model learning itself is performed by searching for a hypothesis within the solution dimension and reducing the error.

In this module:

  • Over-fit and Under-fit of the model
  • Cross validation
  • Train / Test – stratified
  • Unbalanced data
  • Multicollinearity

Supervised learning – Regression

Regression is a technique for investigating the relationship between independent variables and a dependent variable. It is used as a method for predictive modeling in machine learning, where an algorithm is used to predict a continuous number.

  • Linear regression
  • Decision tree and forests
  • Time series
  • Performance measure (MSE,R^2,R^2 adj)



A labeling task classifies the records into selected classes (for example spam email versus non-spam – every email we receive will be classified into one of these labels according to the model's decision). This is a function from the space of examples (our data) to a space of labels (classes we want to categorize into). We will learn how to estimate the quality of the model and how to analyze its errors.

  • Logistic Regression
  • Decision trees
  • Ensemble and bagging
  • AdaBoost, XGBoost, Bagging
  • Neural networks
  • performance measures (AUC, Accuracy, Confusion Matrix)
  • Deep learning


Unsupervised learning uses machine learning algorithms to analyze and aggregate unlabeled data sets ("answer"/ label). Algorithms uncover patterns or groupings of data without human intervention by reducing dimensionality, unifying by analogy, or explaining variance.

  • Unsupervised data
  • Dimension reduction
  • K-Means
  • Hierarchical clustering
  • PCA – principal component analysis
  • Performance measures for Clustering






Model Optimization

In this module we will learn how to optimize the ML model and how each of the parameters affects the learning of our model.


  • Hyper parameter tuning
  • Feature importance
  • Model optimization



In this module we will learn how to explain the results of the model and how each of the parameters affects the learning of our model. It is not enough to produce a model with high accuracy. You also need to explain the reasons and business benefits of the model's result. In addition, you need to demonstrate that it does not discriminate against certain populations and manage the model's accuracy over time.

  • Model explainability – Shap
  • feature importance
  • Feature selection


Tools for collaborative data science, here we will learn how to operate in a team, how to collaborate code and managing version control. We will also present advanced programing skills and platform control.

In this module:

  • Best practices Jupyter notebooks
  • Python best practices
  • Git and versioning
  • How to operate in a team
  • Collaborate information and code

Production environment and model monitoring

A production environment is a term primarily used to describe the environment in which software and models are actually put to use for their intended uses by end users. A production environment can be considered a real-time environment where programs are run for corporate or commercial activity. In this module we will discover how to upload an ML model to production, how to maintain it while it is there and how to allow access to others.

  • Upload a model to production
  • Model drift and data drift
  • Bias over time
  • API of a model

In this module, we will present the most significant innovation in machine learning- Deep Learning. We will explore the different uses of deep learning and the network architectures that enable each form of use. Text processing, image recognition, and extracting information from a video feed are some of the uses that we will present.

In this module:

  • Evolvement of deep learning
  • Perceptron, NN
  • PyTourch and TensorFlow
  • Image recognition (CNN)









Cloud computing is the delivery of computing services (including servers, storage, databases, networks, software, analytics, and machine learning) over the Internet ("the cloud") to offer faster innovation, flexible resources, and knowledge sharing.

In this module:

  • Cloud computing Basics
  • Cloud platforms
  • Scaling

Deep learning on the “cloud”

Deep learning is one of the most advanced forms of machine learning, capable of processing and understanding both images and text. Deep learning requires a lot of computation capabilities (mainly GPUs). The most common solution is cloud computing, which involves processing data on computer power in the cloud. Another common use is a data science platform in the cloud. We will present the different uses of deep learning on one of the most known data science platforms in the world.

In this module:

  • Machine learning on Cloud
  • Machine learning in DataIku
  • Data science platform
  • Implementation of Image recognition model

The concluding project of the course is a connecting factor, connecting all the parts learned during the course. As part of the project, the students will deal with an unfamiliar data set and will analyze it in-depth, beginning with forming analytical questions, performing the analysis, building a model, and explaining the results.

The students will carry out a practical and final project that will simulate a real work environment that includes:


1. Analysis of goals and business problems with the tools learned.

2. Work in a team accompanied and supported by an experienced data science team leader.

3. Work meetings and presentation of products in front of a business entity, and the decision maker

מובילי הקורס

אמיר לורך

DS מוביל G-STAT

מנהל עשרות פרויקטים בתחום ה-DATA SCIENCE

מפתח מתודולוגי של מסלול ההכשרה

רו"ח אסף אלקן

מנכ"ל G-Academy

 בעל ניסיון של מעל 8 שנים בהדרכות וליווי DATA, 

סגן מנהל חטיבת GRISK בחברת G-STAT

חברת G-STAT
בין לקוחותינו

השאירו פרטים ויועץ מומחה יחזור אליכם בהקדם