tensor - Tensör İşlemleri

Çok boyutlu diziler ve matris operasyonları - SIMD optimizasyonlu

2.5x

Hızlanma

70 GB/s

SIMD

AVX2

Optimizasyon

32×32

Block Size

Genel Bakış

Tensor modülü, çok boyutlu diziler üzerinde yüksek performanslı işlemler sağlar. NumPy ve PyTorch API'lerine benzer tasarım ile tanıdık kullanım sunar.

Temel Özellikler

Çok Boyutlu Diziler: 1D, 2D, 3D, N-D tensor desteği
SIMD Optimizasyonu: AVX2 ile 70 GB/s throughput
Bloklu Matris Çarpımı: Cache-optimized, 2.5x hızlanma
Broadcasting: NumPy-style shape uyumluluğu
Memory Efficient: View ve slice desteği

Performans

Operasyon	Boyut	Öncesi	Sonrası	İyileştirme
Matris Çarpımı	256×256	0.97 GFLOPS	2.47 GFLOPS	2.5x
Matris Çarpımı	512×512	1.30 GFLOPS	2.28 GFLOPS	1.8x
Element-wise Add	10K elements	8.75 GB/s	70.2 GB/s	8x
Element-wise Mul	10K elements	8.2 GB/s	65.9 GB/s	8x

Hızlı Başlangıç

Tensor Oluşturma

kullan ai/tensor;

// Sıfırlardan oluştur
değişken a = Tensor::sıfırlar([3, 4]);  // 3×4 matris

// Birlerden oluştur
değişken b = Tensor::birler([2, 3, 4]);  // 2×3×4 tensör

// Rastgele değerler
değişken c = Tensor::rastgele([100, 50]);

// Belirli değerlerle
değişken d = Tensor::yeni([2, 2], [1.0, 2.0, 3.0, 4.0]);

Temel İşlemler

kullan ai/tensor;

değişken x = Tensor::yeni([3, 3], [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]);
değişken y = Tensor::yeni([3, 3], [9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0]);

// Element-wise işlemler (SIMD optimizasyonlu)
değişken toplam = x.topla(y);      // Addition
değişken fark = x.çıkar(y);        // Subtraction
değişken çarpım = x.çarp(y);       // Multiplication
değişken bölüm = x.böl(y);         // Division

// Skaler işlemler
değişken scaled = x.skaler_çarp(2.5);
değişken shifted = x.skaler_topla(10.0);

Matris Çarpımı (Bloklu Algoritma)

kullan ai/tensor;

// Küçük matrisler (naive algoritma)
değişken A = Tensor::rastgele([32, 32]);
değişken B = Tensor::rastgele([32, 64]);
değişken C1 = A.matmul(B);  // 32×64 sonuç

// Büyük matrisler (bloklu algoritma - 32×32 blocks)
değişken M = Tensor::rastgele([256, 256]);
değişken N = Tensor::rastgele([256, 256]);
değişken P = M.matmul(N);  // 2.5x daha hızlı!

yazdir("Sonuç boyutu: {}×{}", P.boyut(0), P.boyut(1));

Broadcasting ve Reshape

kullan ai/tensor;

// Broadcasting
değişken x = Tensor::yeni([3, 1], [1.0, 2.0, 3.0]);
değişken y = Tensor::yeni([1, 4], [1.0, 2.0, 3.0, 4.0]);
değişken z = x.topla(y);  // Sonuç: [3, 4]

// Reshape
değişken a = Tensor::aralık(0.0, 12.0, 1.0);  // [12] shape
değişken b = a.yeniden_şekillendir([3, 4]);   // [3, 4] shape
değişken c = a.yeniden_şekillendir([2, 2, 3]); // [2, 2, 3] shape

// Transpose
değişken d = b.transpose();  // [4, 3] shape

İleri Seviye İşlemler

Reduction İşlemleri

kullan ai/tensor;

değişken x = Tensor::yeni([3, 4], [/*...*/]);

// Tüm elemanlar üzerinde
değişken toplam = x.sum();           // Skaler
değişken ortalama = x.mean();        // Skaler
değişken max = x.max();              // Skaler
değişken min = x.min();              // Skaler

// Belirli eksende
değişken satır_toplamı = x.sum_eksen(0);    // [4] shape
değişken sütun_toplamı = x.sum_eksen(1);    // [3] shape

Matematiksel Fonksiyonlar

kullan ai/tensor;

değişken x = Tensor::yeni([2, 3], [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]);

// Element-wise fonksiyonlar
değişken exp_x = x.exp();        // e^x
değişken log_x = x.log();        // ln(x)
değişken sqrt_x = x.sqrt();      // √x
değişken pow_x = x.pow(2.0);     // x²

// Trigonometrik
değişken sin_x = x.sin();
değişken cos_x = x.cos();
değişken tanh_x = x.tanh();

Teknik Detaylar

SIMD Optimizasyonu

AVX2 komut seti ile 256-bit vektör işlemleri:

8 float paralel işleme (256 bit / 32 bit = 8)
Runtime CPU özellik tespiti: is_x86_feature_detected!("avx2")
Otomatik scalar fallback AVX2 yoksa
70 GB/s throughput 10K element ile
Remainder handling: Son <8 element scalar

Bloklu Matris Çarpımı

Cache-optimized tiling algoritması:

Block Size: 32×32 (4KB per block)
L1 Cache: 32KB (8 blocks fit)
Optimizasyon: Cache miss oranını minimize eder
Threshold: Matrisler ≥64 boyutunda aktif
Speedup: 256×256 için 2.5x, 512×512 için 1.8x

Memory Layout

// Tensor internal structure
struct Tensor {
    data: Vec<f32>,        // Row-major order
    shape: Vec<usize>,     // Dimensions
    strides: Vec<usize>,   // Memory strides
}

// Example: [2, 3] tensor
// data: [a, b, c, d, e, f]
// shape: [2, 3]
// strides: [3, 1]
// Access: data[i*strides[0] + j*strides[1]]

API Referansı

Tensor::yeni(shape, data)

Yeni tensor oluştur

değişken t = Tensor::yeni([2, 3], [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]);

Tensor::sıfırlar(shape)

Sıfırlarla dolu tensor

değişken t = Tensor::sıfırlar([100, 50]);

tensor.matmul(other)

Matris çarpımı (bloklu algoritma ≥64)

değişken C = A.matmul(B);  // [m×k] × [k×n] = [m×n]

tensor.topla(other)

Element-wise toplama (SIMD optimizasyonlu)

değişken c = a.topla(b);  // 70 GB/s throughput

Platform Desteği

Platform	SIMD	Matmul	Broadcasting
x86_64 (Intel/AMD)	✓ AVX2	✓	✓
ARM64 (Apple Silicon)	✓ NEON	✓	✓
ARM32	Scalar	✓	✓
WASM	✓ SIMD128	✓	✓

BERK Programlama Dili