!hands-On Data Scientist
מסלול Data Scientist פרקטי לשילוב בשוק העבודה ובחברת G-STAT
מרצים מומחים בעלי ניסיון בהובלת אסף אלקן ואמיר לורך
קורס חדשני המותאם לשוק העבודה – הרצאות פרונטליות לצד מפגשי אונליין
320
שעות אקדמאיות
מתוכם כ-160 שעות תרגול Hands-On המותאם לשוק העבודה!
למה ללמוד יחד עם
G-Academy?
יותר תכלס
מסלול הכשרה אינטנסיבי ומעשי. עשרות Datasets, מאות תרגילים
פיתוח חשיבה אנליטית
דגש על פיתוח חשיבה ביקורתית, וגישה לבעיות עסקיות. לא רק כלים טכניים
כלים לשוק העבודה
לומדים את ה-Best Practice המקנה מענה לבעיות העסקיות הנפוצות בשוק
ליווי אישי
זמן מתרגל שבועי, אחד על אחד, עבור כל שאלה שעולה לכם
ניתוח נתונים זה היומיום שלנו
G-Stat היא חברה המונה כ-230 אנליסטים ו Data Scientists
הכי מחוברים לשטח
מרצי הקורס מגיעים מהתעשייה, חיים את עולם מדעי הנתונים, והכי חשוב - יודעים ואוהבים ללמד
חומרי ההדרכה שלנו
אשר כוללים - הקלטות, דוגמאות, תרגילים והסברים. ילוו אתכם הרבה אחרי שהקורס יסתיים
בואו לעבוד איתנו
בהתאם לדרישות השוק - בוגרי הקורס המצטיינים יקלטו לעבודה ב G-Stat
מחזורים הבאים
תכנית לימודים
קורס Data Scientist
שעות לימוד פרונטליות: 160
שעות לימוד פרטני ע"י מתרגל: כ-60
שעות תרגול, למידה עצמית
ועבודה על פרויקט גמר: כ-100
01
Intro to DS & statistics
02
eda & visualization
03
supervized learning
04
unsupervized learning
05
deep learning
06
cloud computing
סילבוס
The first module presents a general description of the data scientist position, common business use-cases and how to solve them using Machine learning and Data science capabilities.
We will provide a detailed description of the course outline and describe the basic libraries which we will use during the course.
In this module:
• Business use-cases and solutions
• What is data science, and what does a data scientist do?
• The data science pipeline
Detailed course outline
The second module is dedicated to the statistical knowledge needed from a Data-Scientist.
In order to process and define key features from Big-Data, there is a need to test and evaluate the full extent of the data, using statistical methods.
In this module:
• Mean, median, standard deviation
• Random selection VS population
• Data Distribution – Focusing on Gaussian
• Statistical tests –Z, T,chi
• Type one and type two errors
• Variables type
In this module we will focus on exploring the data, his behavior and shape, using a variation of graphs and visualizations. This process is the main part of the day-to-day work of a data scientist. We will present the science behind analyzing big amount of data and how to enable generalization capabilities on a population.
In this module:
• Data vitalizations
• Outliers detection and treatment
• Correlation and collinearity
• Graphing using MatplotLib and seaborn
In this module, we will examine the steps and ways to select features for the future machine learning model. This section is devoted to data manipulation and selection.
We will present a few of the most effective strategies for organizing data and selecting the most appropriate features. This will ensure that the model can be taught in the most comprehensive manner.
In this module:
• Data cleaning
• Dealing with missing values
• Transformations and aggregations
• One-dimensional analysis
• Transformations in the time dimension
In this module, we will explore the world of machine learning and the various existing models. We will understand which tasks can be solved using ML, what are the appropriate models for each task and what are their basic hypotheses.
In this module:
- Machine learning models
- Supervised vs. unsupervised
- Scikit-learn
Supervised learning is a branch within machine learning where we train the model based on labeled information, "solved examples". The model learning itself is performed by searching for a hypothesis within the solution dimension and reducing the error.
In this module:
- Over-fit and Under-fit of the model
- Cross validation
- Train / Test – stratified
- Unbalanced data
- Multicollinearity
Supervised learning – Regression
Regression is a technique for investigating the relationship between independent variables and a dependent variable. It is used as a method for predictive modeling in machine learning, where an algorithm is used to predict a continuous number.
- Linear regression
- Decision tree and forests
- Time series
- Performance measure (MSE,R^2,R^2 adj)
Classification
A labeling task classifies the records into selected classes (for example spam email versus non-spam – every email we receive will be classified into one of these labels according to the model's decision). This is a function from the space of examples (our data) to a space of labels (classes we want to categorize into). We will learn how to estimate the quality of the model and how to analyze its errors.
- Logistic Regression
- Decision trees
- Ensemble and bagging
- AdaBoost, XGBoost, Bagging
- Neural networks
- performance measures (AUC, Accuracy, Confusion Matrix)
- Deep learning
Unsupervised learning uses machine learning algorithms to analyze and aggregate unlabeled data sets ("answer"/ label). Algorithms uncover patterns or groupings of data without human intervention by reducing dimensionality, unifying by analogy, or explaining variance.
- Unsupervised data
- Dimension reduction
- K-Means
- Hierarchical clustering
- PCA – principal component analysis
- Performance measures for Clustering
Model Optimization
In this module we will learn how to optimize the ML model and how each of the parameters affects the learning of our model.
- Hyper parameter tuning
- Feature importance
- Model optimization
Explainability
In this module we will learn how to explain the results of the model and how each of the parameters affects the learning of our model. It is not enough to produce a model with high accuracy. You also need to explain the reasons and business benefits of the model's result. In addition, you need to demonstrate that it does not discriminate against certain populations and manage the model's accuracy over time.
- Model explainability – Shap
- feature importance
- Feature selection
Tools for collaborative data science, here we will learn how to operate in a team, how to collaborate code and managing version control. We will also present advanced programing skills and platform control.
In this module:
- Best practices Jupyter notebooks
- Python best practices
- Git and versioning
- How to operate in a team
- Collaborate information and code
Production environment and model monitoring
A production environment is a term primarily used to describe the environment in which software and models are actually put to use for their intended uses by end users. A production environment can be considered a real-time environment where programs are run for corporate or commercial activity. In this module we will discover how to upload an ML model to production, how to maintain it while it is there and how to allow access to others.
- Upload a model to production
- Model drift and data drift
- Bias over time
- API of a model
In this module, we will present the most significant innovation in machine learning- Deep Learning. We will explore the different uses of deep learning and the network architectures that enable each form of use. Text processing, image recognition, and extracting information from a video feed are some of the uses that we will present.
In this module:
- Evolvement of deep learning
- Perceptron, NN
- PyTourch and TensorFlow
- Image recognition (CNN)
Cloud computing is the delivery of computing services (including servers, storage, databases, networks, software, analytics, and machine learning) over the Internet ("the cloud") to offer faster innovation, flexible resources, and knowledge sharing.
In this module:
- Cloud computing Basics
- Cloud platforms
- Scaling
Deep learning on the “cloud”
Deep learning is one of the most advanced forms of machine learning, capable of processing and understanding both images and text. Deep learning requires a lot of computation capabilities (mainly GPUs). The most common solution is cloud computing, which involves processing data on computer power in the cloud. Another common use is a data science platform in the cloud. We will present the different uses of deep learning on one of the most known data science platforms in the world.
In this module:
- Machine learning on Cloud
- Machine learning in DataIku
- Data science platform
- Implementation of Image recognition model
The concluding project of the course is a connecting factor, connecting all the parts learned during the course. As part of the project, the students will deal with an unfamiliar data set and will analyze it in-depth, beginning with forming analytical questions, performing the analysis, building a model, and explaining the results.
The students will carry out a practical and final project that will simulate a real work environment that includes:
1. Analysis of goals and business problems with the tools learned.
2. Work in a team accompanied and supported by an experienced data science team leader.
3. Work meetings and presentation of products in front of a business entity, and the decision maker
מובילי הקורס

אמיר לורך
DS מוביל G-STAT
מנהל עשרות פרויקטים בתחום ה-DATA SCIENCE,
מפתח מתודולוגי של מסלול ההכשרה

רו"ח אסף אלקן
מנכ"ל G-Academy
בעל ניסיון של מעל 8 שנים בהדרכות וליווי DATA,
סגן מנהל חטיבת GRISK בחברת G-STAT