Text2CAD: An AI-powered CAD/CAM assistant that integrates advanced Visual-Spatial Intelligence (VSI)
The project addresses the specific challenges of CAD/CAM tasks, focusing on text-3D interpretation and innovative solution generation. Our goal is to develop a system capable of:
- Interpreting design specifications (textual or parametric)
- Performing spatial reasoning and 3D visualization
- Suggesting innovative design solutions
In practice, your data might come from:
- Existing 3D CAD models (e.g., .STEP, .STL, .OBJ, etc.)
- Synthetic data generation
- Text-based design specs or parametric definitions
For demonstration, let's assume we've collected a small set of 3D parts and their textual descriptions. We'll show how you might load them and perform basic preprocessing.
# For data manipulation
import os
import numpy as np
import pandas as pd
# For geometry processing
import open3d as o3d
import trimesh
# For deep learning & LLMs
import torch
from transformers import AutoModel, AutoTokenizer
# We'll simulate having a list of 3D object files
cad_files = [
"part1.obj",
"part2.obj",
"assembly1.obj"
]
# Placeholder: if these files don't exist, skip loading
loaded_meshes = {}
for f in cad_files:
if os.path.exists(f):
mesh = trimesh.load(f)
loaded_meshes[f] = mesh
print(f"Loaded {f} with {len(mesh.vertices)} vertices and {len(mesh.faces)} faces.")
else:
print(f"File not found: {f} - skipping...")
# Example text-based design specs
text_specs = {
"part1.obj": "A mechanical bracket with holes at each corner for screws...",
"part2.obj": "A gear-like part with radial spokes, 100 mm outer diameter...",
"assembly1.obj": "An assembly combining part1 and part2 with specific constraints..."
}
print("\nData acquisition & preprocessing complete.")
For advanced spatial reasoning, we need 3D representations that are machine-readable. We can use point clouds, meshes, or parametric shapes depending on the tasks. Let's demonstrate a simple geometric operation:
def compute_bounding_box(mesh: trimesh.Trimesh):
bbox = mesh.bounding_box
return bbox.primitive.extents # (length, width, height)
# Example bounding box calculation
for filename, mesh in loaded_meshes.items():
bbox_dims = compute_bounding_box(mesh)
print(f"{filename} bounding box (L, W, H): {bbox_dims}")
print("\nBaseline 3D representation system established.")
We want to combine language-based instructions/feedback with 3D geometry. We'll use a placeholder LLM (like GPT-2 or any open-source large model) to parse text specs and generate suggestions.
model_name = "DeepSeek-V3"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
def generate_text_prompt(prompt: str, max_length=50):
# A simplistic placeholder for LLM text generation
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model(**inputs)
# We'll just return the hidden state shape for demonstration
return outputs.last_hidden_state.shape
# Example usage
prompt_example = "Suggest possible design constraints for a gear with 100 mm outer diameter"
response_shape = generate_text_prompt(prompt_example)
print(f"LLM response (placeholder shape): {response_shape}")
print("\nLLM component initialized.")
Visual-Spatial Intelligence (VSI) involves tasks like:
- Spatial layout optimization
- Constraint satisfaction
- 3D manipulation and transformation
We'll set up a placeholder function that "analyzes" the 3D object and returns recommended transformations based on textual instructions.
def spatial_reasoning_module(mesh: trimesh.Trimesh, instruction: str):
"""
Placeholder for a module that uses geometry + LLM-based reasoning to
propose transformations or checks on the mesh.
"""
# For demonstration, we'll do a simple check:
# if "reduce weight" in instruction -> scale model down
recommended_transformation = None
if "reduce weight" in instruction.lower():
# Suggest a scale factor of 0.9
recommended_transformation = "Scale mesh by 0.9"
elif "enlarge" in instruction.lower():
recommended_transformation = "Scale mesh by 1.1"
else:
recommended_transformation = "No transformation recommended"
return recommended_transformation
# Example usage
for filename, mesh in loaded_meshes.items():
instruction = "We want to reduce weight by optimizing geometry."
transform = spatial_reasoning_module(mesh, instruction)
print(f"For {filename}: {transform}")
print("\nIntegrated VSI placeholder module executed.")
Here, we demonstrate how one might loop through multiple rounds of instructions, combining text-based feedback and geometry updates to converge on a design solution.
design_instructions = [
"Reduce the thickness of outer edges.",
"Add holes to minimize weight.",
"Ensure the part can withstand 200 N load on the bracket."
]
def iterative_design_process(mesh: trimesh.Trimesh, instructions: list):
for idx, instr in enumerate(instructions):
# Use text interpretation
text_info = generate_text_prompt(instr)
# Use spatial reasoning
transform_suggestion = spatial_reasoning_module(mesh, instr)
# Log the decisions (in real usage, you'd apply the transform)
print(f"Iteration {idx+1}: {instr}")
print(f" - LLM interpretation shape: {text_info}")
print(f" - Spatial transform suggestion: {transform_suggestion}")
print("--------------------------------------------------")
for filename, mesh in loaded_meshes.items():
print(f"=== Iterative Design Process for {filename} ===")
iterative_design_process(mesh, design_instructions)
print()
print("\nIterative design suggestions demonstrated.")
We can evaluate our AI assistant in multiple ways:
- Geometric metrics: volume, surface area, bounding box, collision checks
- Performance metrics: time to generate suggestions, solution quality
- User feedback: subjective satisfaction or trust in the AI suggestions
- Prototype tasks: e.g., can the AI propose valid constraints?
Below, we show a placeholder for some geometric metrics.
def evaluate_mesh(mesh: trimesh.Trimesh):
# Compute volume, surface area
vol = mesh.volume
area = mesh.area
return vol, area
for filename, mesh in loaded_meshes.items():
vol, area = evaluate_mesh(mesh)
print(f"{filename}: volume={vol:.2f}, surface_area={area:.2f}")
print("\nEvaluation placeholders complete.")
-
Advanced 3D Feature Extraction:
- Integrate libraries that handle parametric definitions and advanced geometry (e.g., FreeCAD, Onshape, or commercial APIs).
-
Fine-Tuning or Instruction-Tuning LLMs:
- Explore instruction-tuned large language models that better handle domain-specific vocabulary and design language.
-
Multimodal Fusion:
- Create a more direct pipeline that fuses 3D geometry embeddings with language embeddings. For instance, use vision transformers adapted to 3D or point cloud networks.
-
Human-in-the-Loop Approaches:
- Provide interactive UIs in Jupyter or specialized CAD frontends for real-time user feedback and incremental design modifications.
-
Trust & Explainability:
- Develop methods for explaining the AI’s suggestions, highlighting critical geometry changes or constraints.
-
Real-time Collaboration:
- Eventually integrate these capabilities into real design environments, enabling multiple engineers to interact with the AI system simultaneously.