IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi

2024/06/0721:11:34 technology 1571

IEEE x ATEC

IEEE x ATEC Technology Sharing Conference is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practices to promote digital development.

In the process of social digitization, with the continuous deepening of networked and intelligent services, various risks derived from services cannot be ignored. The theme of this sharing session is "Risks and Countermeasures of Internet Fraud". The five guests will share about risks and countermeasures in online fraud scenarios from different technical fields and perspectives.

The following is a speech by researcher Zhuang Fuzhen, "Application of NN Model in Financial Risk Control Scenarios".

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

Speaker | Zhuang Fuzhen

Researcher at Beihang University Artificial Intelligence Research Institute

ATEC Technology Elite Competition Senior Advisory Committee Expert

"Application of NN Model in Financial Risk Control Scenarios"

I am very happy to attend the IEEE x ATEC Technology Symposium . The topic I shared today is "Application of NN Model in Financial Risk Control Scenarios". My speech today is mainly divided into three parts: background, research work, and our summary.

As we all know, the third-party online payment market has developed rapidly in the past decade or so. At the same time, criminal activities related to online transactions have also increased significantly, and such transaction fraud has seriously threatened the online payment industry. In 2016, the Internet Crime Complaint Center received nearly 3.8 million complaints, resulting in more than 1.3 billion in financial losses. The most common types of online transaction fraud are account theft and card theft. A compromised account is an unauthorized account operation or transaction made by a fraudster after taking control of someone's payment account, usually due to compromised credentials. A stolen card means that information related to someone's card, such as card number, billing information, etc., has been obtained by a fraudster and used for some unauthorized charges.

Let me share some of the research work we have done jointly with Ant Group. There are three main works, one is user event sequence analysis based on neural hierarchical decomposition machine (SIGIR 2020), the second is fraud detection based on dual importance-aware decomposition machine (AAAI 2021), and the third is our explainability aspect The proposed cross-domain fraud detection utilizing hierarchical explainable networks to model user behavior sequences (WWW 2020).

1. User event sequence analysis based on neural hierarchical decomposition machine

is first based on user event sequence analysis based on neural hierarchical decomposition machine. In the payment business, everyone starts from registering the system, logging in to the system, then putting the products of their choice into the shopping cart, and finally making transactions or making payments. Based on the user's account dynamics, we can determine whether the next payment is a fraud. There is rich data sequence information available for users' account dynamics. Work that only focuses on feature combinations or work that only focuses on sequence information can only model user event sequence behavior from a separate perspective. Each event is only through simple embedding, splicing or full connection, and it is difficult to obtain better results. event representation. We hope to design a hierarchical model that combines both aspects of modeling to analyze fraud detection.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

There are two cases in the right picture. One is a movie review record on a website (Figure 1), which is also a user behavior sequence. The biggest contribution here is how to represent this event. We have just seen that each event actually contains many characteristics.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

As shown in Figure 2, the characteristics of an event include multiple characteristics from X1 to Xn. Our user event sequence includes T events from e1 to eT. Each event has 56 features in the scene, including 50 categorical features and 6 numeric features. The combination of features within an event is actually more discriminative in determining and predicting fraud detection. For example, if a cross-border transaction is conducted within 1 minute, we can easily judge that it is a card theft. We hope to use the FM model to model this feature combination relationship. FM is a model that automatically performs second-order feature combination in the embedding space.Take a look at the representation of events (Figure 2): vi and vj are vectorized spatial representations of two features, which are a combination of two features. Xi and Xj are actually representations of a weight. Finally we will get a representation of the event from the feature interaction.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

After this event is represented, we hope to get a better sequence representation, that is, we extract a better feature representation of this sequence. Each user sequence actually contains multiple events, and the combination of two events occurs, which is more discriminative for fraud detection. Likewise, we also want to consider the impact of the sequence between events. For example, if we do event A first and then event B, the possibility of fraud may increase. We hope that our model can model the impact of this sequence. Starting from the previous point of view, the modeling of event combinations is represented by S, which is also done by the factorization machine. For the two-by-two combination of different events, qi and qj are also its weights. Regarding the sequence impact, we consider it from two aspects. One is from the importance of the event itself, which has a self-attention mechanism to represent it, which is Sself; the other is to use the RNN network to model the historical sequence behavior of the event. Information, that is, bidirectional LSTM to model. Finally, we can conclude that this sequence is composed of three parts: a combination of events; the self-attention mechanism of the event; and a characteristic of the event itself. Combining the three together gives the overall sequence representation. The picture on the right of

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

is a framework we proposed, also called neural hierarchical decomposition machine. Starting from the bottom, are the characteristics of the event. After we encode the event features, we can obtain the representation of the event and learn the representation of the sequence. After extraction, you can see the output of the model as a multi-layer perceptron. We can also do a linear classification on this Feature. Finally, we treat these two parts as a parameter of a Sigmoid and get an output between 0 and 1. The final optimization function is actually a cross-entropy loss function, and N is used to learn all labeled data. This is a framework for our model.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

In this experiment, we use a real data set in the industry. For example, on an e-commerce platform, we obtained data sets from three regions. The positive examples of this data set are fraudulent behaviors, and the negative examples are normal transaction behaviors. It can be seen that the difference between normal transaction behaviors and abnormal fraudulent behaviors is very large, and the categories are very unbalanced. We also conducted an experiment on our public data set and movie data set. In terms of benchmark algorithm comparison, we used some more advanced algorithms, such as WD (Wide deep) width and depth, as well as NFM, DeepFM, xDeepFM, and M3, which uses hybrid models to simultaneously learn the long-term and short-term dependencies of sequences.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

Our evaluation index is the recall rate when the low user interruption rate is more concerned in real industrial scenarios. That is, when we give the results, we hope to call the first few users to tell them that this may be A fraudulent act. For example, if we make 1,000 phone calls, these 1,000 should be fraudulent, that is, the higher the ratio, the better. Therefore, the evaluation indicators we use focus more on the head of the ROC curve (FPR

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

from the lower left picture, It can be seen that changes in this kind of IP, as well as changes in other feature values ​​and field values, will lead to the occurrence of some fraudulent behaviors.

2. Fraud detection based on dual importance perception decomposition machine

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

is based on dual importance perception. A decomposition machine is used for fraud detection. In the first work, we can see that the IP is constantly changing. We need to take into account the evolution of a certain value and a certain field in a series of events. The evolution and interaction of different field values ​​are actually very important, and existing work does not pay attention to these two points at the same time. Therefore, we want to design a DIFM model that combines these two aspects. We are also based on this FM. The model creates a framework. First, for each stage, we also use the FM model to capture the evolution of different events.You can see the picture in Figure 3. From the brown direction, we consider the characteristics of f1, which means that it changes with events, and we model it. This is our new contribution. After FM modeling, we proposed a perception module such as Field Importance-aware. We use the attention mechanism to perceive which field's evolution is more important to our predictions. We also propose a module called importance perception. In another direction, for each event we used earlier, the model captured the pairwise interaction characteristics of different field values ​​​​(the blue part in the figure) through FM, and then used the attention mechanism to perceive which event is more important through the Event Importance-aware module. (Green part in the picture). Finally, we output the prediction results through the two parts of information obtained by the Field Importance-aware module and the Event Importance-aware module and the current event characteristics. It can be seen that this model is relatively simple and practical. In this business application scenario, we can deploy it online efficiently and effectively. This is the second work we proposed. Some experimental results of the second work of

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

also used the data sets of the three regions in the first work. We have added some precise algorithms to this work, including AFM, using LSTM for fraud detection, and using Latent Cross to integrate contextual information into RNN. This data is the same as the experimental data of the previous work.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

From this result, we can see that we also use the recall rate when the user interruption rate is low to evaluate our experiments. You can see the bottom DIFM (one of our experimental results). The experimental results are greatly superior to all baselines, including ablation. Experiment, DIFM-α only considers field value evolution, DIFM-β only considers field value interaction, DIFM is a combination of two sub-models, DIFM is also superior to all previous comparison algorithms, this is a simple and practical algorithm we proposed .

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

In terms of interpretability, our model can also extract high-risk features and high-risk events. From the picture on the right above, you can see the blue circle. Every change will turn into a blue circle, with some relative changes. You can see that the mantissa behind the card falls in each range, and every change will cause a fraudulent behavior or a change in the card value. There are also IP changes, we can catch them. This is what we proposed to explicitly model the situation where a field value changes with events and sequences for fraud detection. It also provides a better reference for interpretability. As we all know, interpretability is very necessary in financial fraud detection. That is, when you tell the user that this transaction is a fraud, you must tell him what characteristics may violate which rules, or your event may lead to some fraud. Behavior. Interpretability becomes a very important task. In the following work, we hope that we will also build a hierarchical model of interpretability for the entire process from the perspective of interpretability, from the feature level, from the event level, and also from our cross-domain level. Therefore, we also propose a cross-domain fraud detection that utilizes hierarchical explainability networks to model user behavior sequences.

3. Cross-domain fraud detection using hierarchical explainable network modeling of user behavior sequences

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

Motivation is actually relatively simple and direct. First, we knew earlier that the user behavior sequence is very important. Second, we want to consider how this explainability helps our business. Third, when this e-commerce platform launches new business in different regions, it may not be well modeled due to the small amount of data. We hope to migrate or migrate it from other platforms with more mature data or more mature models. Learn from it to model a cross-domain fraud detection model.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

We propose this hierarchical interpretable network. First, we propose a feature-level and event-level interpretability network for this fraud detection. The picture on the right is a framework we proposed. Similarly, we encoded this feature in the previous section. Field-level Extractor is a representation of events.After the event is represented, the sequence is represented. There is another layer we call Wide layer. Wide layer is a linear classifier that is simply learned using features. Here we use a multi-layer perceptron after being connected in series. The interpretability here is reflected in the fact that there are two interpretables in the single-domain model, one is which fields and which features are more important, and which historical events in the sequence are more important.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

For each step, the first Look-up embedding, we actually transform the feature value into a vector. We divide this transformation into categorical and numerical transformation rules, and use this formula to do the transformation. Field-level Extractor is a representation of events. In the previous work, we only considered the interaction between two features to show which feature is more important. We added a wit. This wi is equivalent to saying that for this feature, in T normalization of the importance of its features at a moment. For events, it also has an expression of event importance, which is UT. UT is the following expression. There is also a wide layer below to learn the white list, that is, we use linear analysis to learn, and finally predict and learn the problem. We also use MLP and include the sigmoid function to map it to between 0 and 1, using cross entropy comes and goes to learn the entire learning problem, this is L(θ).

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

We have proposed a transfer learning framework. As mentioned earlier, there may be less data in different regions or different scenarios, and some may have more data. We want the less (data) to help the more (data) more. We call those with a small amount of data Target Events, and those with a large amount of data are called source fields or Source Events. Here, we hope to learn some knowledge that is unique to the source domain and the target domain, as well as some things that they share. We hope that this Source can share some knowledge to help Target learn and make some predictions. Consider from several aspects, in our scenario, one is the Embedding strategy, why the Embedding strategy is proposed, sharing and extraction of your unique behavior sequence, and including the attention of your field. That is to explain to a certain extent how this field has helped me with my Target problem, how much it has helped, and how we can align distributions between different fields, that is, Aligning Distributions. Interpretability is reflected in the perspective of Domain Attention.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

Why do we propose this Embedding strategy? We all know that the corresponding values ​​of the same fields in different regions may be different. For example, the consumption fields and consumption amounts in China and and Vietnam are different. In China, it may be 0 to 100 yuan, but in Vietnam, it may not be 0 to 100 yuan. . Therefore, the values ​​of the fields may be different, the behavior habits of users in different regions may be different, and the same extractor may not be effective for two regions at the same time, so the behavior sequence extractors are also divided into Domain-Specific and Domain-Shared. That is, we transfer some specific or domain-invariant features and maintain some things unique to our own domain. We divide Attention in this field into a field-specific and field-shared representation, namely Shared and Specific factors. The calculation formula is shown in the figure. In terms of alignment of distributions between different fields, we know that traditional alignment methods are not suitable in our application scenario, because the categories in our scenario are extremely unbalanced, that is, the proportion of positive and negative classes we get The difference is very big. For example, we can even compare one to ten thousand, and only one out of ten thousand may have abnormal behavior. Let's propose this Class-aware, category-aware Euclidean Distance. From this perspective, when we calculate the distance in this field, we do it from the perspective of categories, that is, a process of considering different categories.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

Further, our transfer learning framework is generalized into a general transfer learning framework. As you can see from the right image of the above figure: the dotted line indicates that what we propose is a hierarchical interpretable network, which is used as a sequence extractor. That is, we can replace the sequence extractor in the dotted line with other models to serve as Extractor for events.For example, in our migration model, we can incorporate other baselines into our migration learning framework as a special case. So we only need to define which part is used as the behavior sequence extractor, and we can build such a fraud detection model.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

Similarly, we also used the data set on an e-commerce platform for this data set. This time we added a relatively small data set to this data set, that is, it may only have hundreds or thousands of data. For one positive example, there may be hundreds of thousands of negative examples. Similarly, we use the ones with the least data as Target Events for our experiments. For bassline, we also choose Fraud baselines such as WD, NFM, LSTM4FD and M3R as our basic models. Let’s first look at some experimental results in a single field, and we also use the recall rate with low user interruption as our evaluation index.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

You can see these two pictures, which are the experimental results in the four regions of C1, C2, C3, and C4. They are much better than the baseline. The final vertical line is the result of our model.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

We also apply our transfer learning framework to all base models, that is, we put the model sequence behavior extractors of all baselines into the transfer learning framework and replace the dotted line in the middle. The blue line is a result we obtained after using the transfer learning framework. The results prove that better experimental results can be obtained after transfer learning. This horizontal axis represents the data we use from less to more, such as training data from one week to two or three weeks... So as the training data increases, the results generally become better. This blue line means that our previous effect was much better than the original one. That's probably the case.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

Judging from the interpretability of the results, we can see that from the feature level, the darker the color of each row, the more important its features are. You can see that there is a feature that clearly catches our importance. Looking from the vertical Y-axis, the deeper the depth, the more important the event. We can catch the importance of different events. As can be seen below, Domain-Shared is equal to 0.56, which means that when we build this Target model, the knowledge contributed by the Shared part is 56%, and the Target itself is 44%. It can be seen that we do such an explainable task from three levels, from the granularity of features to the intensity of events and then to the intensity of attributes.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

The model we proposed has been implemented in the ATO (account takeover) scenario of the e-commerce website. It can provide account transaction risk analysis, identification and prevention, and weight value analysis of event granularity/attribute granularity to assist operations students in determining the restoration risk path. The work we proposed has also been deployed online.

IEEE x ATEC IEEE x ATEC Technology Symposium is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technology practi - DayDayNews

Finally, let’s summarize. During the cooperation process, we proposed a neural hierarchical decomposition machine to analyze the user event sequence, while modeling the interactive relationship between fields and the fraud detection model of Field Value evolution, and proposed a general transfer learning interpretable framework. , our interpretable results for detecting fraud. Finally, we also carried out online deployment and application implementation. Now it has been applied quite well, especially in some scenarios where our algorithm is integrated into the fraud detection module. That’s all my sharing of

, thank you very much.

Leifeng.com

technology Category Latest News