Every morning when I wake up, I face a problem as soon as I open my eyes: What to wear today? At this time, many options often pop up in my mind, but none of them satisfy me. I often fall asleep again after thinking about it.

Every morning when I wake up, I face a difficult problem as soon as I open my eyes: What to wear today? At this time, many options often pop up in my mind, but none of them satisfy me. I often fall asleep again while thinking about them. 20 minutes later, I found that I had overslept and woke up. I grabbed a cultural shirt and shorts and put them on before leaving. In my hurry, I even dripped toothpaste on my body...

So in the eyes of my colleagues, I have always been a slovenly programmer, and I am indeed I didn’t disappoint everyone. I often appeared in the image of cultural shirts and + slippers. I boasted that I was already married anyway, so it would be boring to show you wearing them. But every morning I never give up, I still think about: What to wear today? This problem seems to be one that I can never solve, but I am not willing to go around it.

I have done a lot of data analysis and recommendations. When I see everything, I have the urge to collect data to solve problems. So I came up with this imaginative idea: use data analysis to solve the things that bother me every morning when I get up, so that I can go to work happily and confidently~~

I sorted out the overall process of using data to solve problems at work:

  1. Define clearly the needs Problems solved
  2. Data collection, cleaning data
  3. Define indicators and perform statistical calculations
  4. Segment and drill down indicators, observe data and draw conclusions
  5. Take out some typical cases for detailed analysis
  6. Optimize through the conclusions of 4 and 5 Strategy
  7. Use the optimized strategy and continue to observe the indicators

defined in step 4. There will still be many detailed issues, such as whether the indicators meet expectations, assumptions and verification of problems encountered.

Write it out, stick it on the wall, and take action. Every time I open a pit, there is a mixture of excitement and nervousness.

Data analysis is such an exciting thing. There will be a lot of inspiration in your mind, and you need to sort them out, otherwise it will be easy to go off track midway.

When you don’t see the data, you will never know what the conclusion is. Is the data that came out the same as what I expected? If it's different, what's the reason? If not, what kind of assumptions and verifications need to be made?

As a result, I am sometimes excited, but I am often disappointed. The biggest fear is not that the conclusion does not meet expectations, but that after searching for a long time, no useful conclusion was found. We can only accept that there is no conclusion for the time being, which is also a conclusion. Keep these data in mind, and maybe you will have some inspiration to use them someday.

is really a work where logic, reason and inspiration collide!

Clearly define the problems that need to be solved

In fact, it is not that I have no clothes. Although it is not too much, it still fills up most of the wardrobe. When I first started making money on my own, I also "squandered" a lot of Taobao hot items. But the feeling of being without clothes never seemed to go away.

Let’s sort it out:

  • I often feel dissatisfied with the clothes currently available.
  • I don’t know how to buy them. It seems that I keep buying but it’s still not enough.

From the perspective of recommendation strategy, the wardrobe can be considered as our candidate pool. Various occasions and seasons in life represent the needs of users with different characteristics (actually it’s all me, who changes in different situations).

Such as (workday/work/spring/after get off work, I want to exercise, I hope it is simple and bright, the sequence I wore a few days ago (xxxxx), the sequence I washed when dirty (xxxxx)) or (weekends, take the children to the park; summer , I know how to run and jump to take pictures, I hope it will be easier to move and take pictures...)

Recommended effect: Personal feelings have been entangled for a long time or I feel that the clothes are not enough, which means the effect needs to be improved.

Here are the clothing selection strategies and evaluation indicators - whether personal feelings are satisfactory or not are relatively subjective and difficult to quantify. After all, women are so complicated that I can't even understand myself.

And every time we are dissatisfied with our outfit, we will think that it is because we have no clothes to wear, that is, there is not enough pool (clothes). So the problem we hope to solve is: how to optimize the pool to improve the effect when has a fixed distribution strategy and evaluation indicators.

Of course, since the pool was also bought based on my own decision, the problem is to be solved: How to optimize the strategy of building a pool (buying clothes). After all, the hesitation time when buying clothes is often longer than wearing them. If I can have a clear understanding of what kind of clothes I need, it will definitely save a lot of effort.

data collection, cleaning data

basic data construction and cleaning, clean data is always the most important.

.1 Basic data construction

basic data: each piece of clothing and its related attributes. Relevant attributes are convenient for subsequent statistics and drill-down. Each piece of clothing is photographed for case-by-case analysis. If this analysis took me a whole weekend, then 80% of the workload was here.

I smoothed out all the clothes in the closet and took pictures. I added some labels and organized them into an excel sheet.

Combined with the goals of the analysis, the tags are mainly based on the factors considered when making decisions about buying clothes, the decision-making factors when wearing clothes, and finally whether the clothes are worn or not. The following tags are given:

  • type (vest, short-sleeved, pajamas, sweatshirts , jumpsuits, etc.)
  • Season (spring, autumn, summer, winter)
  • Time of purchase (student days, after work, within one year)
  • Purchase channel (mall, Taobao, given by others)
  • Color (flower, gray, stripes...)
  • Degree of specialness (special, a bit special, quite satisfactory)
  • Frequency of upper body (high, medium, low, gradually lower, never want to wear it again)

Actually, I want to mark more, for example, who did I buy it with? What is the main purpose when buying? Do you try it on before buying? But I really don’t have the energy anymore. It’s very tiring to recall the past and present life of each piece of clothing.

.2 Dirty data processing

If you do not take some samples to look at in advance, or do some simple verification, it is easy to be trapped by dirty data. They often use very small quantities and very abnormal values ​​to bias indicators such as the mean.

I eliminated some clothes, mainly including: elders must give them to me if they think they are suitable for me to wear; clothes bought for special events cannot be worn a second time, such as performance clothes. These clothes were not chosen by me and are not included in the analysis for the time being.

    Define indicators and perform statistical calculations

      .1 Quantity

      Simple and intuitive, it is also the most important indicator of the recommendation pool. After all, our “never enough clothes” appeal lies in quantity. The contrast and segmentation thinking mainly used here. Because the total amount is definitely quite a lot, if you feel that it is not enough, it must be concentrated on certain subdivided labels. Segmentation and comparison is to find these tags.

      Let’s take a look at the total amount first. In fact, I don’t know whether the number of

      is too much or too little. This is a problem in data analysis: a lot of data needs an overall average or comparison to know the size.For some data, through long-term observation of this type of business data, the average value and distribution can be roughly understood, and the size can be known by seeing it. For example, data such as the penetration rate of and of each tab of Cloud Music are known in advance.

      And I don’t have information on the number of other people’s clothes or the average distribution. I can only make a simple estimate, 99 items are clothes and pants, outerwear and inner wear, all included.

      There are three seasons, each season has 30 pieces of clothing. If the upper body and lower body are equally divided, it becomes 15 pieces of clothing in each season. 15 pieces of clothing in 4 months, the total amount is not a lot (scratching my head guiltily), at least it is not exaggerated.

      made a simple drill-down and comparison of the quantitative indicators (a very simple and easy way to draw conclusions). Summer has the most clothes and winter has the least. This matches the climate in the south.

      When we look at each data, we will have a rough prediction in mind. For example, based on seasonal data, before looking at the data, we can preliminarily judge from the climate that summer should be the most popular. When the data matches our expectations, it is also a verification of the accuracy of the data.

      When the data does not match our expectations, we need to pay attention and further verification and inspection.

      html Looking at it over time, the clothes purchased in the past 10 years still account for the vast majority. New clothes account for 33%, and 22% of clothes are from 7 years ago. There are also a few clothes bought for undergraduates over 10 years ago. It seems that I have not gained much weight.

      The distribution of usage frequency from low to high is left-skewed. It’s true that there are a lot of clothes that are used very infrequently (no preference). In response to my feeling of “always feeling like I don’t have the right clothes”, my goal is to adjust this distribution to the right.

      The mall buys the most clothes. If you like, you can just take them away.

      Formal clothes are less related to personal temperament, and there is no need for formal occasions, which is in line with expectations.

      conducted some simple crosses in various dimensions, and got some further conclusions.

      The problem of low frequency of use is the most serious problem with spring clothes, and I like fewer clothes. The clothes I currently use in winter are still relatively common.

      • occasions cross the seasons, and I find that summer is really a romantic season, with more holiday styles. One piece of formal clothes for each of the three seasons is perfect and sufficient. Next time you see something more formal, you don’t have to spend time thinking about it.
      • occasion cross special degree. On holidays, there are more special clothes, and on weekdays, there are more regular clothes. More reasonable.

      There is another point that cannot be ignored in clothes, the matching attributes. Why the clothes don't match together is also a big worry when choosing. Analyze the ratio of tops/bottoms. Except for dresses and jumpsuits, there is no need to match them.

      The inappropriate upper and lower assembly ratio appears. 11.5 tops and a pair of pants for spring wear. There are very few jeans with versatile bottoms and need to be restocked accordingly. The analysis of

      quantitative indicators gave me a better understanding of my wardrobe. Understand which categories need to be replenished and which ones are sufficient.

      In addition to quantity, quality is very important. Girls are more or less constantly buying clothes, but why do they still feel that they don’t have enough clothes when they keep buying clothes? Focus on analyzing the clothes that you never want to wear again. What do they look like? Learn from failure experiences.

        .2 Elimination rate

        Elimination rate = clothes I no longer want to wear/all clothes

        "Clothes bought and never worn" are the biggest pain in my heart.It takes up space, has nothing to wear, and costs money, and you still have to be told: Look at how many clothes there are in the cabinet, why do you still say you have no clothes?

        Analyze the characteristics of clothes with a high elimination rate to avoid stepping on the trap. Also give yourself some guidance when you are confused about buying clothes in the future. Similarly, dimensional segmentation thinking and comparative thinking are the main means.

        The overall elimination rate is 30%, and one-third of the invalid clothes account for a relatively high proportion.

        It depends on the season, and it is especially high in winter. Although there are many winter clothes that are frequently used, there are also many that you no longer want to wear. Some of them need to be eliminated.

        I want to discuss a question here. There are so many dimensions, how to choose when we drill down. For large-scale data and high-dimensional situations, we can use the machine learning method of . We can specify the elimination rate as an indicator, and then calculate the contribution of each feature.

        But in data analysis, interpretability is very important, and a lot of data is used to verify our hypotheses. There is no need to predict accurately or train a model. (Of course, if you use a model, you will generally look at the features with high contribution, whether they meet expectations, and whether there is any inspiration)

        Therefore, in data analysis, the preferred dimension for drilling down is: most likely to have distinguishing , which can verify some assumptions, or have special meaning in the scenario. For example, many drill-downs in terms of quantity are developed according to the "seasonal" dimension. Because the dimension of season has a special meaning, spring, summer, and autumn clothes cannot be worn interchangeably. Therefore, drill down into this dimension first, and it will be easier to find some problems.

        As for the elimination rate indicator, drilling down first is most likely to be differentiated, and it is also a dimension that can verify hypotheses: purchase time.

        Is there a direct relationship between the clothes that I don’t want to wear and whether they are old or new? If you don't want to wear it just because you have bought it for a long time, that is not a decision-making issue when purchasing.

        The obsolescence rate from high to low is, postgraduate students or undergraduates who bought it after working and bought it within one year

        The obsolescence rate is not lower for newer clothes. The obsolescence rate of undergraduate clothes is lower than that after working. Does this mean that early vision is better? One thing to note is that only 5% of your wardrobe is clothes purchased during your undergraduate years.

        The reason here is conceivable: the clothes I bought for my undergraduate degree were ten years ago, and the ones I can keep to this day are probably my favorite batch. If all the undergraduate clothes were kept until now, the elimination rate would definitely be much higher.

        The clothes purchased within a year have the lowest elimination rate. There are still relatively few aesthetic pitfalls recently. Therefore, there is an unfair point in the obsolescence rate indicator: the obsolescence rate of clothes purchased in the past year is obviously low.

        So if there is a category of clothes with a low elimination rate, it may not necessarily be because of my wise and unique decision-making, but it may also be because I have bought a lot recently and the clothes within a year account for a large proportion.

        As we saw earlier, the elimination rate of summer clothes is low. Is it because we buy more summer clothes in a year? Looking at cross seasons and purchase time:

        It can be seen that the elimination rate of clothes purchased within a year and a year ago is lower in summer than in spring and autumn, and it is especially low within a year. Considering that most people wear short sleeves in summer, it is not easy to get into trouble.

        It is worth noting winter clothing. The obsolescence rate for purchases within a year is higher than a year ago. Although there are some winter clothes currently in use that are frequently used. But if you bought it recently, the probability of not wanting to wear it at all is also higher. You need to shop rationally in the near future.

        Purchase channel is also an important dimension. Recently, the proportion of online shopping has become larger and larger. But what is more disturbing is that the discard rate of clothes purchased online is actually higher than those given by others.

        From a style perspective, more unique clothes are more likely to be eliminated.Moderate clothes are relatively safe and consistent with common sense. Especially for spring’s special styles, you need to be careful because the elimination rate is astronomical. It’s not a big problem to have more variety in summer.

      Typical case specific analysis

      Which dimensions have a relatively high failure rate, after having a general understanding. In order to further imprint the bad case in my heart, I will learn from every experience. I marked the reasons for the clothes I never wanted to wear again, and what happened to them. Use traceability thinking and give examples of solutions one by one:

      Output conclusion: Clothes buying strategy

      In summary, the following strategies were summarized this weekend:

      1. Denim trousers are very much needed;
      2. Go to the mall to try on and buy winter clothes . Winter clothes are always worn, and there is a risk that they will be ruined if they break;
      3. Summer clothes are sufficient and personal satisfaction is high, so you can postpone the purchase (occasionally shopping online is the icing on the cake);
      4. Don’t buy flashy spring clothes, which are basically useless. Wearing;
      5. Resolutely return inappropriate clothes when shopping online. Unsightly clothes purchased online are the number one reason for elimination.

      As the decision-making changes, it is very important to continue to observe the data

      and not to create scattered data but to build an analysis system. Indicators that can identify problems in the analysis are precipitated. It is crucial to observe the business situation and the changes in the strategy.

      Updating the original data after the measures in step 5 are implemented, observing changes in indicators, and adjusting directions in a timely manner are the keys to maintaining the "ecological health" of the wardrobe. However, time is limited, so I am a bit overwhelmed with the collection and entry of original data. I hope I can persevere.