Let’s see how far you are from becoming an AI engineer~⭐Specific content: Why did I start to touch code-touching artificial intelligence/machine learning/deep learning self-study how to find learning materials How to choose programming languages/framework campus recruitment/commu

Machine Heart Reprinted

Source: Datawhale

This is a long article about how to become an AI algorithm engineer ~

Friends often send private messages to ask, how to learn python, how to type code, how to enter the AI ​​industry?

just looks back at the journey I have walked this year and summarized my experience.

Let’s see how far you are from becoming an AI engineer~

Specific content:

  • Why did I start to touch code
  • Artificial intelligence/ Machine learning /Deep learning
  • Self-study how to find learning materials
  • How to choose Programming language /Framework
  • Campus recruitment/Social recruitment/Internship/Interview experience
  • A bowl of chicken soup

Statement:

  • The content of this article is a personal opinion. It is recommended to adopt experiences that are useful to you. If there are any omissions, please correct me and make progress together!
  • Started his first internship in May 2017 / Started his code typing in July 2017 / After graduating with a master's degree in November 2017,
  • is good at programming languages: R / Python
  • does not spend money to enroll in classes, but relies entirely on self-study. At first, it was because of poverty, but later I found that the "open source" world is so beautiful!

Why did I start to touch code?

What is my first model?

Since undergraduate is mathematics and graduate students are quantitative analysis, the first internship was a financial technology company, and began to come into contact with the so-called "Fintech"

The first task is to make a credit score card model for customers, with the purpose of giving each user a credit score, similar to Alipay's Sesame Credit Score. This is a standard model for banks. The most common and traditional algorithm uses logistic regression.

The tools used in class are SAS and SPSS. They have an operation interface and have a very complete menu. You only need to click a mouse to model it, which is easy to get started. However, the annual fee for SAS is still quite expensive, so most companies in Shenzhen choose open source and free R language or Python for data analysis and modeling. This reflects the importance of mastering a programming language. Although

is a modeling task, it is basically unrelated to modeling in the first three months. We are doing data cleaning, table sorting (hands spreading), and library packages, and the most commonly used are data.table and dplyr. There is no way, many models have packages that can be called directly, which is the simplest link.

In fact, at the beginning, I kept making very low-level mistakes, reporting all kinds of errors, no library, not making punctuation marks correctly, and making all kinds of low errors again and again, and I couldn't even read the content of the error, so I didn't know how to correct them. If you are like me, please don’t be discouraged. This is how I came here. Just solve the errors one by one~

When I didn’t know anything at that time, I felt it was really difficult. There were so many details to take care of each link, so I had to learn so much, and after doing one, I had to understand the meaning of the business. But when you do it all and then look back, you will feel that it is not that difficult in fact~ Is it easy to type code in

?

Because I am not a computer professional, I basically belong to the kind of people who have never typed code much.

Later I found that there are many types of programmers, such as front-end and back-end, so there are many types of codes to be struck, which is why there are dozens of programming languages. The following figure shows some mainstream languages ​​in recent years.

During the internship, I was always despised by my teammates, and the things I knocked out at the beginning couldn't run any of them, and the results were all the results. There was a n-time idea of ​​giving up, "Why do I have to knock this thing?", but there were also n+1 reason to stick with it, because I really like what I'm doing. Why use "persistence"? It's really not easy. It's not difficult, but it really requires patience.

At the beginning, my state was running line by line of code, familiar with each command, read it repeatedly, and run it repeatedly.

  • took 3 months from knocking out the first line of code to knocking out the first complete model.
  • learns XGBoost, I learned theory for 3 months, because the early preparations also include adaboost/gbdt and various machine learning knowledge modules. It took 1 month to switch from R to python.
  • switched from machine learning to automatic machine learning to 2 months.
  • took 1 month to build an intelligent Q&A robot from NLP to building a smart Q&A robot.

From the "What is overfitting, what is cross-test, what is loss function" a year ago to the one who can understand it when attending the Global Artificial Intelligence Summit, I feel that hard work is not in vain!

It can be seen that after the previous accumulation, you will learn faster and faster later.

slowly changed from the "Oh, why did you report an error again? It's so frustrating" mentality at the beginning to the current "Ah? No error? It feels wrong, check it again". The code has abused me thousands of times, but I have been beaten to death.

Some friends have already said they want to change careers. I have never thought about it, but I just persevered without realizing it. Because of love, the more you abuse it, the more you can’t stop

summary

Set a very clear goal

Why is the first to write: "Why did I start typing the code?" Because motivation is really important!

So, when many people ask me the question "How to learn python?", my first answer is "What do you use to learn python?"

has also hit python in school to make a crawler demo, or something, because it is not very purposeful and put it aside soon. A clear goal is for example, if you want to do NLP, you need to know that NLP applications include intelligent Q&A, machine translation, search engines, etc.

Then if you want to do intelligent Q&A, you should know that the most developed technology now is deep learning, and the algorithms used include RNN/LSTM/Seq2Seq/ and so on.

And my clear goal is to give me the tasks during the internship. When the task is clear, the language required is clear, the algorithm to be learned is also clear, and many things are natural without having to bump into them.

From finance to technology

AI has a wide range of applications, and every research direction is endless. Since financial companies rarely have intersections with technology such as image processing, NLP, and my strong curiosity made me decide to go to a purely tech company to find out. Currently, I have devoted myself to smart homes, with the goal of Javis

⚡ Artificial Intelligence/Machine Learning/Deep Learning

I often see these words on bus billboards, as if a company is lagging behind without this technology. There are more learnings such as reinforcement learning, transfer learning, incremental learning, etc. What is the relationship between the words

?

Machine learning is a type of artificial intelligence, and deep learning is a type of machine learning. Learn machine learning first when learning AI.

The difference between the "algorithm" of computers and the "algorithm" of mathematics

Theoretical knowledge is extremely important to AI algorithm engineers. Typing code is just an implementation process of ideas. The "algorithms" here are not the same as the "algorithms" of computer CS. AI algorithms are mathematical derivation, so the basics of mathematics still need points. The deeper you learn, the higher the requirements. During interviews, I rarely let my handwritten codes be written, and 90% of them are asking the model to pick up the details of the algorithm.

At school, I am a person who doesn’t like to take notes, or even a person who doesn’t like to attend classes. However, since I got into the pit of machine learning, the notes have been written so far~

Machine learning framework

According to whether there is Y value in the data set, machine learning can be divided into supervised learning, semi-supervised learning and unsupervised learning. Supervised learning is a classification algorithm, and unsupervised learning is clustering algorithm .

machine learning general process and related technologies are as follows:

ML This tree can have more branches. First have a general feeling, and then solve it one by one. The knowledge points here are also some of the most popular questions in interviews, and they are the key points! The students who have interviewed should be familiar with each other.

How to get started with machine learning

Machine learning is so great that beginners can't start. To put it bluntly, machine learning is about making predictions of various models, so there is data. To have good results, you have to clean the original dirty data before you can use it. The information hidden in the data is sometimes invisible to the naked eye, so some relevant skills are needed to mine out useful information. All the techniques used to rack your brains are to make predictions more accurate. But no one can make a 100% hit.

Here we briefly introduce the three major aspects of machine learning: traditional machine learning ML, image processing CV, and natural language processing NLP.

recommends another entry-level tool:

  • Kaggle (www.kaggle.com)

This is the most authoritative machine learning competition in the world and has been acquired by Google . The above questions are not only very representative, but also have many free and excellent data sets for you to use. You should know that collecting data is the number one problem in machine learning, and it will help you solve it. You don’t have to participate in the competition immediately. Download the data and do whatever you want. If you don’t have any ideas, it’s also great to search for other people’s problem-solving notes and codes online to learn from. Because this is a competition where everyone is competing to play, you are not alone.

ML Introduction to the competition questions to participate in (Titanic)

Image Introduction to the competition questions to participate in (digital recognition)

NLP Introduction to the competition questions to participate in (sensibility analysis, quora question semantic matching)

, etc., you should feel a little bit after finishing the first titanic competition. I have done all the above 4 competitions and I think they are very classic and suitable for entry.

What are the introductory algorithms for deep learning

Nowadays sample input can be text, images, and numbers.

Deep learning became popular with image processing. Even now this concept has become popular with "machine learning".

deep learning algorithms are mainly neural network series. A series of introduction recommendations for CNN (convolutional neural network):

  • LeNet5
  • AlexNet
  • VGG
  • GoogleNet
  • ResNet

Self-study How to find learning materials?

open source world, beautiful world ❤

"Open source", my love! The central idea of ​​open source in the code industry is that share and free

has a very good community atmosphere for machine learning, and many sharing are comprehensive, and MLer is very helpful.

introduces several communities, forums, and web pages that I often visit:

kaggle (www.kaggle.com)

The world's most authoritative machine learning competition has been acquired by Google . The competition questions cover traditional machine learning, nlp, image processing, etc., and they are all very practical problems, from all walks of life. kaggle is one of the best and perfect ML communities. The data set with open questions is very useful and is very suitable for beginners to practice. Job opportunities are also provided for excellent kagglers.

github (www.github.com)

The world's largest homosexual dating website is suitable for searching projects and open source communities. Let's watch the stars together and read issue~

StackOverFlow (www.stackoverflow.com)

code errors and find it, and the code will not be typing to find it! All the code-related pitfalls have basically been touched by

csdn (www.csdn.net)

The most down-to-earth blog gathering place, one of the most commonly read web pages, generally used to search for detailed knowledge points or when the code errors are reported

sklearn (scikit-learn.org/stable)

professional machine learning for 100 years! Examples of various algorithms and techniques codes include

medium (medium.com)

Founder is the founder of Twitter, and advocates high-quality content. Many domestic AI public accounts are transported from here. Each author in medium has his own unique insights, which is worth learning and broadening his horizons. He needs to go to the Internet scientifically.

towards data science (towardsdatascience.com)

is very similar to medium. He needs to go to the Internet scientifically.

google AI blog (ai.googleblog.com)

Google AI The blog maintained by the team, with at least one technical blog updated every day.The Google developer conference, which just held in Shanghai, announced that it will open machine learning courses for free. It is worth paying attention to it. After all, it is an AI giant

. Technical blogs/personal websites of various masters. There are many websites, and they will be updated from time to time. In my personal blog,

, the AI ​​open course platform

, first of all, I did not attend classes, and I did not sign up for classes, which is a problem of personal learning habits. However, considering the differences in learning, we still summarized the course series with high reputation rankings. The premise is that you need to have a certain mathematical foundation, and you can make up for it if you don’t have one.

coursera (www.coursera.org/browse)

Andah Ng (Andrew Ng) Machine Learning

deeplearning.ai (www.deeplearning.ai)

fast.ai (www.fast.ai)

fost.ai (www.fast.ai)

fost.ai (www.fast.ai)

fost. The founder of Fast.ai is quite interesting. He is a master of sweeping kaggle image processing, not pretending to be stupid or making a fool of himself. The central idea is that deep learning is simple, don’t be afraid. fast.ai has blogs and communities. Jeremy and Rachel encourage activities such as blogging, building projects, and having discussions in meetings to replace the proof role of traditional certificates with strength.

udacity (in.udacity.com)

has Chinese version, and the course covers programming basics, machine learning, deep learning, etc.

NetEase Cloud Classroom

Fragment Time

0There are trends in the technology circle, and you will know when you get into the trap.

chases the latest papers, the latest algorithms, the latest competitions, and what are the Internet celebrities in the AI ​​circle? If you have the conditions, you can open Twitter. It is quite interesting to watch the machine learning section for entertainment. There are many self-deprecating comics~

recommends a few AI-themed American dramas that I love to watch very much.

Silicon Valley (strong recommendation! It is simply my daily life, it resonates too much~Merry drama)

Western World (Don't learn how to implement this and that technology when watching it)

Practical tips

Browser first recommendation chrome

When reading English web pages, right-click to select "Transfer to Chinese (simplified Chinese)"

passed the IELTS and GMAT, I used to be a child who loved English. Now I am kneeling down in a large number of technical documents and documents to survive

. Search questions must be used with Google. If it is not solved, is it your problem? Isn’t Google’s pot?

baidu? ? ? ummm... Don't make things difficult for me... It is important to learn to ask questions rarely using

. The search format recommends

language + questions, for example: python how to convert a list to a dataframe

directly copy the error message, for example: ValueError: No variables to save...

Please throw all the questions up, search online faster than asking people! Always asking others will cause relationship breakdowns ~

Learn to follow the clues

When you read a very good technical document, don’t rush to close it after reading it. This may be a personal website, go and observe whether there is an [About] option in the menu bar. Or this may also be an excellent community, see if there are any [Home] options, and go and see other articles in po.

Many excellent websites are in English, and scientific Internet access is essential

Learning costs do not come from the course, but may come from hardware requirements. Students should make good use of school resources

summary

Although I have said so much, I still have to say please give up a large amount of information! Just find as much as you use! (Don't take this sentence as a warning)

information does not mean that the content is of quality assurance. Many courses or official accounts only need to fill in knowledge, and when you have questions, it cannot be answered. This will not work well. Just like a model that only needs to be trained but does not verify, it is just a hooligan.

How to choose programming language/framework

Preferred English! ! ! (Ahem, I'm serious)

After all, language is just a tool, and you don't blindly pursue any technology. Choose a language according to the task, and different programmers choose a different programming language. Many people don’t focus on abilities but show off their tools in the end, so they are a bit off-the-scenes.

It is observed that in the machine learning group, R and Python are the two languages ​​with the highest usage rates. Generally, you can use which one you use, as long as the effect can be achieved, unless mandatory.

After using it, my feeling was that life was short. How difficult was it to build a model with python

?

algorithm tasks are roughly divided into two types. One is "package adjustment and parameter adjustment" done by ordinary algorithm engineers, and the other is done by senior algorithm engineers. They can create an algorithm themselves or can flexibly modify other people's algorithms.

First talk about how easy it is to build a model.

has an excellent algorithm encapsulation framework

tensorflow / caffe / keras /...

Auto ML is an unstoppable direction

Auto ML (auto machine learning), automatic machine learning. Just throw the data in and wait for the results to come out. A while ago, the CloudML of Google was very popular. The vision was to allow everyone to model, but after all, this kind of service costs money. So I studied the code of the open source auto sklearn framework and what did I find? How simple is modeling? Just as simple as 4 lines of code, it can defeat a modeler with 10 years of experience.

Let’s talk about it again. If you don’t know what you are doing at all and can only come up with a result that you cannot be responsible for, it is very bad. You are not a qualified algorithm engineer. Your model must be like your biological ones. But, as long as you want, you can definitely do it!

Learn what to install on python.

Anaconda

pair, it's that simple and crude. Install this and it's OK.

. Those who learn python should face the choice of whether it is python2 or python3. The language version and environment are really a headache, but Anaconda amazed me, that is, you can customize the python environment. You can use py2 with left hand and right hand py3

to recommend several python IDE

Spyder

Anaconda's own ideas. The interface layout is very similar to Rstudio and Matlab. The result is output when input is input, which is suitable for analysis work. I like to use it when writing small functions.

Jupyter Notebook

Anaconda The idea comes with it, belongs to the web interface. It is suitable for use when your program is running on a virtual machine and you want to adjust the code.

PyCharm

is more friendly to writing projects or reading code. It is especially useful when you need to write many python files to import each other.

My laptop configuration

(If you do not consider economic constraints, please ignore this one)

Brand + Model: ThinkPad X1 Carbon

Recommended configuration: i7+16G Memory + 256G (or more) Hard disk

System recommendation: Linux, because it is open source, you can play

Campus recruitment/Company recruitment/Internship/Interview experience

How to arrange campus recruitment

The opening time of major factories will be earlier, pay close attention to the online application time node:

  • 2019 Autumn Recruitment: July 2019 - 11 Spring recruitment for the monthly
  • 2020: February 2020 - April 2020
  • 2020 Summer internship for the monthly
  • 2020: March 2020 - May 2020
  • 2020 Autumn recruitment for the monthly
  • 2020: July 2020 - November 2020
  • (and so on)

Hand-torn code ability

It is recommended to start preparation six months in advance. My code started from the internship. After knocking for half a year, I felt like there was a god. Don’t do after-class questions that are meaningless, and don’t type according to the example questions in the book. After you click, you will forget that books are things that have been overcome by all difficulties and you will not be able to grow.

Introduction to practice: National College Student Mathematical Modeling Competition, National College Student Mathematical Modeling Competition, kaggle, Tianchi…

Project Experience/Internship Experience

If you clearly know that your career direction is artificial intelligence/data mining, please do not waste time applying for other internships that are not related to technology. Serve tea and water, takeout and run errands, and printing paper won’t help you. At that time, because my classmates were all going out for internships intermittently, there was an internship in the administrative department of a large factory in front of me. I... actually hesitated for a moment, but fortunately I refused.

Try to choose a technical internship from a large manufacturer, after all, it will be more difficult to get in in the future.But don’t just do 3,000 yuan just by getting 3,000 yuan a month. Follow up on the entire project, understand the framework structure and optimization direction, and try more. Even if you work overtime (overtime is normal in Shenzhen), you will make a profit. Think about how to simplify repetitive work and try to understand the work content and direction of your own department and other departments. The more you know, the more you know what you want to do.

The score card model I made internship is not only traditional logistic regression, but also trying new XGB, etc. Although others are also doing it, I will write the entire model privately, including data cleaning and model tuning, etc., so that I have a more thorough understanding of the business. All the details of the interview were done by myself, so it was smoother.

If you don’t have an internship, the door for us to data mining players will still be open. There is a driving range on kaggle that is specially designed for data mining beginners. There are many related competitions, including , Tencent , Alibaba and other major manufacturers also release algorithm competitions from time to time. It is estimated that there will only be more and more such algorithm competitions. If you insist on completing a project, you can also get relevant rankings on the platform. The higher the ranking, the better it is, hahaha. This is nonsense.

BAT Common interview questions (no order)

  • self-introduction/project introduction
  • category imbalance How to deal with
  • data standardization methods / How to implement regularization / onehot principle
  • Why XGB is better than GBDT What are the methods of data cleaning for good
  • /data cleaning steps
  • missing value filling methods
  • variable filtering? Calculation formula for information gain
  • How to model the implementation of cross-test
  • decision tree How to prune
  • WOE/IV value calculation formula
  • binning methods / What is the principle of binning
  • hand push SVM: write the objective function, calculation logic, formula, what are the plane and non-planar
  • kernel functions
  • XGB Introduction to principle/parameters/decision tree principle/advantages of decision trees
  • Linux/C/Java Familiarity
  • How to solve overfitting
  • What channel do you usually learn machine learning (good questions are worth preparing for)
  • Decision tree pruning first or later
  • What are the loss functions
  • tend to do data mining or algorithm research (good questions)
  • bagging and boosting
  • What are the differences between model evaluation indicators
  • Explain model complexity/model complexity and what are the related
  • Tell a clustering algorithm
  • ROC Computation logic
  • How to judge the loss function and complexity of a model
  • Decision tree and other models
  • Decision tree can have non-numeric variables
  • Decision tree and neural network difference and advantages and disadvantages Comparison of
  • Data structure What are the
  • Model ensembling methods

2 summary

The problem is scattered, knowledge is related, and when learning, you must learn small details from a large framework.

Nothing is wrong, check out the recruitment website and see what the market demand is like. Times are changing very quickly, and the ability to capture information needs to be trained. The points you can pay attention to are: career name/career direction/what programming language is needed/what algorithm is needed/salary/...

At the end of each interview, the interviewer will ask you if you have anything you want to ask. Please note that this question is also critical.

For example: What project is this group currently doing / what language and algorithms are used to implement the project /...

Try not to ask whether to work overtime, whether there is overtime pay, etc., don't ask me why I say that (spread my hands)

encountered something I don't understand during the interview, such as if I don't understand the C++ syntax, you can ask what functions does C++ implement in the project. If you ask good questions and can re-enable the interviewer's interest in you, it can increase the interview success rate.

Fresh graduates should prepare for school recruitment. Don’t be lazy, don’t be afraid of losing, don’t be afraid of being rejected, and get up from wherever you fall. Social recruitment is not something you can say hello, it will be even more frustrating because you have done nothing.Although

is a technology, it still makes great profits in daily social. During the internship, you should also get along with your colleagues around you, especially the big guys. Maybe one day he will help you promote the big factory. You can know unexpected information, interviewers, job requirements, what projects you are doing recently, etc.

Choose the company that gives you the opportunity and don’t waste your time. Don't go to every company. Before going, find out how well the company matches you.

, especially social recruitment. As soon as you change your resume, many people will call you. You must have a strategy to conduct an interview and seize and summarize every opportunity. For example, I am just one hammer in the east and one hammer in the west. Many of them stop at the first time and never reply to the letter because I didn’t reflect and summarize carefully after each meeting. I will still be blinded when I encounter this problem next time, which consumes a lot of time and confidence.

A bowl of chicken soup

Everything has just begun, don’t worry

AI has just started, why? Because when taking math classes, the textbooks were filled with Cauchy, Newton, Gauss, etc., and I felt that they were living in a distant era and felt very strange. But now, the model I use every day is created by Chen Tianqi, who is not a few years older than me. I even follow his social account, and he lives in my world very easily. This feeling is very wonderful.

Every time I check papers and literature, I feel it was too late to read those who came out in 2017. I regret why I learned so slowly. I feel a little relieved when I came out in February 2018. This proof is also a proof that everything is just starting in the wave of development of the times. When opportunities and challenges appear side by side, it is the closest time to creating history. The so-called vents and so-called waves are not important, what is important is because you like it.

Find something you can stick to, don’t stop looking for it

When people do what they like, it will shine!

When you are constantly approaching something because you really love something, your soul seems to be guided by God, instructed, and called. You will naturally know what to do and what you want to do, as if you were born for this. Sometimes you can't even figure out why you do this. Those who have seen the moon and sixpence should understand this sense of mission~

I am not the smart type of person, I am the more stubborn. As long as I recognize it, I recognize it to the end. God knows how many times I have doubted myself and how many times I want to give up, but I still choose to grit my teeth and choose to believe in myself. The meaning of persistence is here.