合併排序因其可靠性和可預測的性能而在排序演算法中脫穎而出。讓我們通過清晰的示例和實際應用來分解它在 Python 中的工作原理。
了解歸併排序的核心概念
歸併排序遵循分而治之的方法:
1. 將列表一分為二
2. 遞歸地對每一半進行排序
3. 合併已排序的兩半
下面是演示這些步驟的基本實現:
def merge_sort(arr):
# Base case: lists of length 0 or 1 are already sorted
if len(arr) <= 1:
return arr
# Divide the array into two halves
mid = len(arr) // 2
left = arr[:mid]
right = arr[mid:]
# Recursively sort each half
left = merge_sort(left)
right = merge_sort(right)
# Merge the sorted halves
return merge(left, right)
def merge(left, right):
result = []
left_idx, right_idx = 0, 0
# Compare elements from both lists and merge in sorted order
while left_idx < len(left) and right_idx < len(right):
if left[left_idx] <= right[right_idx]:
result.append(left[left_idx])
left_idx += 1
else:
result.append(right[right_idx])
right_idx += 1
# Add remaining elements from either list
result.extend(left[left_idx:])
result.extend(right[right_idx:])
return result
# Example usage
numbers = [64, 34, 25, 12, 22, 11, 90]
sorted_numbers = merge_sort(numbers)
print(f"Original: {numbers}")
print(f"Sorted: {sorted_numbers}")
添加視覺進度跟蹤
讓我們修改我們的實現,逐步展示演算法的工作原理:
def merge_sort_with_steps(arr, depth=0):
print(" " * depth + f"Splitting: {arr}")
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = arr[:mid]
right = arr[mid:]
left = merge_sort_with_steps(left, depth + 1)
right = merge_sort_with_steps(right, depth + 1)
result = merge(left, right)
print(" " * depth + f"Merged: {result}")
return result
# Example with smaller list for clarity
numbers = [38, 27, 43, 3, 9, 82, 10]
print("Starting merge sort...")
sorted_numbers = merge_sort_with_steps(numbers)
print(f"Final result: {sorted_numbers}")
此版本顯示了每個拆分和合併作,可幫助您了解演算法的進展情況。
處理自定義對象
實際應用程序通常需要對更複雜的數據進行排序。以下是使用歸併排序對對象進行排序的方法:
class Student:
def __init__(self, name, grade):
self.name = name
self.grade = grade
def __repr__(self):
return f"Student(name='{self.name}', grade={self.grade})"
def merge_sort_objects(arr, key_func):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = arr[:mid]
right = arr[mid:]
left = merge_sort_objects(left, key_func)
right = merge_sort_objects(right, key_func)
return merge_objects(left, right, key_func)
def merge_objects(left, right, key_func):
result = []
left_idx, right_idx = 0, 0
while left_idx < len(left) and right_idx < len(right):
if key_func(left[left_idx]) <= key_func(right[right_idx]):
result.append(left[left_idx])
left_idx += 1
else:
result.append(right[right_idx])
right_idx += 1
result.extend(left[left_idx:])
result.extend(right[right_idx:])
return result
# Example usage with Student objects
students = [
Student("Alice", 85),
Student("Bob", 92),
Student("Charlie", 78),
Student("Diana", 95),
Student("Eve", 88)
]
# Sort by grade
sorted_by_grade = merge_sort_objects(students, lambda x: x.grade)
print("\nSorted by grade:")
for student in sorted_by_grade:
print(f"{student.name}: {student.grade}")
# Sort by name
sorted_by_name = merge_sort_objects(students, lambda x: x.name)
print("\nSorted by name:")
for student in sorted_by_name:
print(f"{student.name}: {student.grade}")
優化內存使用情況
對於大型列表,我們可以實現使用較少內存的就地版本:
def merge_sort_inplace(arr, start=0, end=None):
if end is None:
end = len(arr)
if end - start <= 1:
return
mid = (start + end) // 2
merge_sort_inplace(arr, start, mid)
merge_sort_inplace(arr, mid, end)
merge_inplace(arr, start, mid, end)
def merge_inplace(arr, start, mid, end):
# Create temporary arrays
left = arr[start:mid]
right = arr[mid:end]
i = start # Index for main array
l = 0 # Index for left array
r = 0 # Index for right array
# Merge elements back into main array
while l < len(left) and r < len(right):
if left[l] <= right[r]:
arr[i] = left[l]
l += 1
else:
arr[i] = right[r]
r += 1
i += 1
# Add remaining elements
while l < len(left):
arr[i] = left[l]
l += 1
i += 1
while r < len(right):
arr[i] = right[r]
r += 1
i += 1
# Example usage with memory tracking
import tracemalloc
def demonstrate_memory_usage():
# Generate a larger list for testing
import random
data = [random.randint(1, 1000) for _ in range(10000)]
# Track memory for regular merge sort
tracemalloc.start()
sorted_data1 = merge_sort(data.copy())
snapshot1 = tracemalloc.get_snapshot()
total1 = sum(stat.size for stat in snapshot1.statistics('lineno'))
tracemalloc.stop()
# Track memory for in-place merge sort
tracemalloc.start()
data_copy = data.copy()
merge_sort_inplace(data_copy)
snapshot2 = tracemalloc.get_snapshot()
total2 = sum(stat.size for stat in snapshot2.statistics('lineno'))
tracemalloc.stop()
print(f"Regular merge sort memory usage: {total1 / 1024:.2f} KB")
print(f"In-place merge sort memory usage: {total2 / 1024:.2f} KB")
demonstrate_memory_usage()
實際應用程序:處理日誌文件
下面是使用合併排序處理伺服器日誌條目的實際示例:
from datetime import datetime
class LogEntry:
def __init__(self, timestamp, level, message):
self.timestamp = timestamp
self.level = level
self.message = message
def __repr__(self):
return f"{self.timestamp.strftime('%Y-%m-%d %H:%M:%S')} [{self.level}] {self.message}"
def process_logs(log_files):
"""Merge and sort logs from multiple files."""
all_logs = []
# Read logs from each file
for file_name in log_files:
with open(file_name) as f:
for line in f:
# Parse log line (simplified format)
parts = line.strip().split(' ', 2)
if len(parts) == 3:
timestamp = datetime.strptime(parts[0], '%Y-%m-%d')
level = parts[1]
message = parts[2]
all_logs.append(LogEntry(timestamp, level, message))
# Sort logs by timestamp
sorted_logs = merge_sort_objects(all_logs, lambda x: x.timestamp)
return sorted_logs
# Example usage:
"""
Sample log files content format:
2024-01-15 ERROR Database connection failed
2024-01-15 INFO Server started
2024-01-15 WARN High memory usage
"""
log_files = ['server1.log', 'server2.log', 'server3.log']
# sorted_logs = process_logs(log_files) # Commented out as files don't exist
性能注意事項
歸併排序具有幾個特性,使其適用於不同的方案:
1. 時間複雜度: O(n log n) — 始終如一,與輸入無關
2. 空間複雜度:O(n) 為普通版本,O(1) 為原位
3. 穩定性:保持相等元素的相對順序
這是一個簡單的基準測試函數:
import time
import random
def benchmark_sorting(size=10000, runs=5):
times = []
for _ in range(runs):
# Generate random data
data = [random.randint(1, 10000) for _ in range(size)]
# Time the sort
start = time.perf_counter()
merge_sort(data.copy())
end = time.perf_counter()
times.append(end - start)
avg_time = sum(times) / len(times)
print(f"Average time for {size} elements: {avg_time:.4f} seconds")
print(f"Best time: {min(times):.4f} seconds")
print(f"Worst time: {max(times):.4f} seconds")
# Run benchmark
benchmark_sorting()
結論
當您需要時,合併排序是一個很好的選擇:
- 可預測的性能(始終為 O(n log n))
- 穩定性(保持相等元素的順序)
- 外部排序(處理大文件時)
此處提供的代碼示例為在您自己的項目中實現歸併排序奠定了堅實的基礎。從基本實施開始,然後隨著需求的增長遷移到優化版本。