Python 中的归并排序:分步指南

2025年02月17日03:42:06 科技 1846


Python 中的归并排序:分步指南 - 天天要闻

合并排序因其可靠性和可预测的性能而在排序算法中脱颖而出。让我们通过清晰的示例和实际应用来分解它在 Python 中的工作原理。

了解归并排序的核心概念

归并排序遵循分而治之的方法:
1. 将列表一分为二
2. 递归地对每一半进行排序
3. 合并已排序的两半

下面是演示这些步骤的基本实现:

def merge_sort(arr):
    # Base case: lists of length 0 or 1 are already sorted
    if len(arr) <= 1:
        return arr
    
    # Divide the array into two halves
    mid = len(arr) // 2
    left = arr[:mid]
    right = arr[mid:]
    
    # Recursively sort each half
    left = merge_sort(left)
    right = merge_sort(right)
    
    # Merge the sorted halves
    return merge(left, right)

def merge(left, right):
    result = []
    left_idx, right_idx = 0, 0
    
    # Compare elements from both lists and merge in sorted order
    while left_idx < len(left) and right_idx < len(right):
        if left[left_idx] <= right[right_idx]:
            result.append(left[left_idx])
            left_idx += 1
        else:
            result.append(right[right_idx])
            right_idx += 1
    
    # Add remaining elements from either list
    result.extend(left[left_idx:])
    result.extend(right[right_idx:])
    return result

# Example usage
numbers = [64, 34, 25, 12, 22, 11, 90]
sorted_numbers = merge_sort(numbers)
print(f"Original: {numbers}")
print(f"Sorted: {sorted_numbers}")

添加视觉进度跟踪

让我们修改我们的实现,逐步展示算法的工作原理:

def merge_sort_with_steps(arr, depth=0):
    print("  " * depth + f"Splitting: {arr}")
    
    if len(arr) <= 1:
        return arr
    
    mid = len(arr) // 2
    left = arr[:mid]
    right = arr[mid:]
    
    left = merge_sort_with_steps(left, depth + 1)
    right = merge_sort_with_steps(right, depth + 1)
    
    result = merge(left, right)
    print("  " * depth + f"Merged: {result}")
    return result

# Example with smaller list for clarity
numbers = [38, 27, 43, 3, 9, 82, 10]
print("Starting merge sort...")
sorted_numbers = merge_sort_with_steps(numbers)
print(f"Final result: {sorted_numbers}")

此版本显示了每个拆分和合并作,可帮助您了解算法的进展情况。

处理自定义对象

实际应用程序通常需要对更复杂的数据进行排序。以下是使用归并排序对对象进行排序的方法:

class Student:
    def __init__(self, name, grade):
        self.name = name
        self.grade = grade
    
    def __repr__(self):
        return f"Student(name='{self.name}', grade={self.grade})"

def merge_sort_objects(arr, key_func):
    if len(arr) <= 1:
        return arr
    
    mid = len(arr) // 2
    left = arr[:mid]
    right = arr[mid:]
    
    left = merge_sort_objects(left, key_func)
    right = merge_sort_objects(right, key_func)
    
    return merge_objects(left, right, key_func)

def merge_objects(left, right, key_func):
    result = []
    left_idx, right_idx = 0, 0
    
    while left_idx < len(left) and right_idx < len(right):
        if key_func(left[left_idx]) <= key_func(right[right_idx]):
            result.append(left[left_idx])
            left_idx += 1
        else:
            result.append(right[right_idx])
            right_idx += 1
    
    result.extend(left[left_idx:])
    result.extend(right[right_idx:])
    return result

# Example usage with Student objects
students = [
    Student("Alice", 85),
    Student("Bob", 92),
    Student("Charlie", 78),
    Student("Diana", 95),
    Student("Eve", 88)
]

# Sort by grade
sorted_by_grade = merge_sort_objects(students, lambda x: x.grade)
print("\nSorted by grade:")
for student in sorted_by_grade:
    print(f"{student.name}: {student.grade}")

# Sort by name
sorted_by_name = merge_sort_objects(students, lambda x: x.name)
print("\nSorted by name:")
for student in sorted_by_name:
    print(f"{student.name}: {student.grade}")

优化内存使用情况

对于大型列表,我们可以实现使用较少内存的就地版本:

def merge_sort_inplace(arr, start=0, end=None):
    if end is None:
        end = len(arr)
    
    if end - start <= 1:
        return
    
    mid = (start + end) // 2
    merge_sort_inplace(arr, start, mid)
    merge_sort_inplace(arr, mid, end)
    merge_inplace(arr, start, mid, end)

def merge_inplace(arr, start, mid, end):
    # Create temporary arrays
    left = arr[start:mid]
    right = arr[mid:end]
    
    i = start  # Index for main array
    l = 0      # Index for left array
    r = 0      # Index for right array
    
    # Merge elements back into main array
    while l < len(left) and r < len(right):
        if left[l] <= right[r]:
            arr[i] = left[l]
            l += 1
        else:
            arr[i] = right[r]
            r += 1
        i += 1
    
    # Add remaining elements
    while l < len(left):
        arr[i] = left[l]
        l += 1
        i += 1
    
    while r < len(right):
        arr[i] = right[r]
        r += 1
        i += 1

# Example usage with memory tracking
import tracemalloc

def demonstrate_memory_usage():
    # Generate a larger list for testing
    import random
    data = [random.randint(1, 1000) for _ in range(10000)]
    
    # Track memory for regular merge sort
    tracemalloc.start()
    sorted_data1 = merge_sort(data.copy())
    snapshot1 = tracemalloc.get_snapshot()
    total1 = sum(stat.size for stat in snapshot1.statistics('lineno'))
    tracemalloc.stop()
    
    # Track memory for in-place merge sort
    tracemalloc.start()
    data_copy = data.copy()
    merge_sort_inplace(data_copy)
    snapshot2 = tracemalloc.get_snapshot()
    total2 = sum(stat.size for stat in snapshot2.statistics('lineno'))
    tracemalloc.stop()
    
    print(f"Regular merge sort memory usage: {total1 / 1024:.2f} KB")
    print(f"In-place merge sort memory usage: {total2 / 1024:.2f} KB")

demonstrate_memory_usage()

实际应用程序:处理日志文件

下面是使用合并排序处理服务器日志条目的实际示例:

from datetime import datetime

class LogEntry:
    def __init__(self, timestamp, level, message):
        self.timestamp = timestamp
        self.level = level
        self.message = message
    
    def __repr__(self):
        return f"{self.timestamp.strftime('%Y-%m-%d %H:%M:%S')} [{self.level}] {self.message}"

def process_logs(log_files):
    """Merge and sort logs from multiple files."""
    all_logs = []
    
    # Read logs from each file
    for file_name in log_files:
        with open(file_name) as f:
            for line in f:
                # Parse log line (simplified format)
                parts = line.strip().split(' ', 2)
                if len(parts) == 3:
                    timestamp = datetime.strptime(parts[0], '%Y-%m-%d')
                    level = parts[1]
                    message = parts[2]
                    all_logs.append(LogEntry(timestamp, level, message))
    
    # Sort logs by timestamp
    sorted_logs = merge_sort_objects(all_logs, lambda x: x.timestamp)
    return sorted_logs

# Example usage:
"""
Sample log files content format:
2024-01-15 ERROR Database connection failed
2024-01-15 INFO Server started
2024-01-15 WARN High memory usage
"""

log_files = ['server1.log', 'server2.log', 'server3.log']
# sorted_logs = process_logs(log_files)  # Commented out as files don't exist

性能注意事项

归并排序具有几个特性,使其适用于不同的方案:

1. 时间复杂度: O(n log n) — 始终如一,与输入无关
2. 空间复杂度:O(n) 为普通版本,O(1) 为原位
3. 稳定性:保持相等元素的相对顺序

这是一个简单的基准测试函数:

import time
import random

def benchmark_sorting(size=10000, runs=5):
    times = []
    
    for _ in range(runs):
        # Generate random data
        data = [random.randint(1, 10000) for _ in range(size)]
        
        # Time the sort
        start = time.perf_counter()
        merge_sort(data.copy())
        end = time.perf_counter()
        
        times.append(end - start)
    
    avg_time = sum(times) / len(times)
    print(f"Average time for {size} elements: {avg_time:.4f} seconds")
    print(f"Best time: {min(times):.4f} seconds")
    print(f"Worst time: {max(times):.4f} seconds")

# Run benchmark
benchmark_sorting()

结论

当您需要时,合并排序是一个很好的选择:
- 可预测的性能(始终为 O(n log n))
- 稳定性(保持相等元素的顺序)
- 外部排序(处理大文件时)

此处提供的代码示例为在您自己的项目中实现归并排序奠定了坚实的基础。从基本实施开始,然后随着需求的增长迁移到优化版本。

科技分类资讯推荐

风靡市场!CEWEY DS18无线吸尘器火爆全网!以性能赢得用户口碑! - 天天要闻

风靡市场!CEWEY DS18无线吸尘器火爆全网!以性能赢得用户口碑!

近期,家居清洁领域迎来一款极具竞争力的新品——CEWEY DS18无线吸尘器。DS18一经上线,便以其超规格的吸力参数、场景化的设计思路,以及覆盖多类清洁难题的解决方案,在多个平台迅速走红,成为“百元价位段高性能吸尘器”的代表之一。据多位家电行业分析人士指出,DS18的推出,不仅是CEWEY品牌在清洁赛道上的一次技术释放...
享道出行完成C轮13亿元融资,创近三年行业融资新纪录 - 天天要闻

享道出行完成C轮13亿元融资,创近三年行业融资新纪录

5月9日,上汽集团移动出行战略品牌享道出行宣布完成超13亿元C轮融资。这是国内出行行业近三年来单笔融资金额最大的一次,享道出行也将继续保持“车企资源、技术底座和场景生态”一体化上的行业领先地位。C轮融资完成,享道出行进一步明晰了个人出行、企业出行、未来出行三大主线并行,技术服务双轮驱动的发展战略,将从深化...
一卡通考勤门禁道闸系统主要技术模块 - 天天要闻

一卡通考勤门禁道闸系统主要技术模块

一卡通考勤门禁道闸系统的主要技术模块包括以下几种:一卡通考勤门禁道闸系统  1、人事系统:该一卡通考勤门禁道闸系统主要包括部门管理设置、人员管理设置和卡管理。部门管理设置用于设置公司的主要架构;人员管理设置用于录入人员信息并分配部门;卡管理
Meta发布开源项目《North Star》, 展示Quest顶尖视觉与交互 - 天天要闻

Meta发布开源项目《North Star》, 展示Quest顶尖视觉与交互

近日,Meta 宣布开源项目《North Star》(北极星)正式上线,通过 Meta Quest 头显呈现了一场在 MR 场景下的视觉盛宴与交互新体验。目前,用户可前往 Meta Horizon 商店免费下载这一项目。据悉,《North Star》精心打造了一个沉浸式冒险世界,玩家将化身为航海与探索黄金时代的“北极星号”新晋水手。在这片浩瀚无垠的虚拟...
从陪跑个体到企业培训,我的IP陪跑之路,越走越宽了 - 天天要闻

从陪跑个体到企业培训,我的IP陪跑之路,越走越宽了

大家好,我是Tina。来继续通过文章,分享我的自媒体创业生涯。来说说最近在干的事儿。一今天给江南布衣的全国经销商做了小红书的业务辅导培训。很难想象6年的时间,我从一名职场人,慢慢成长为一个自媒体人,然后成为超级个体,到最后一步步做到可以给企